From lizs at mail.uc.edu  Wed Dec  1 00:15:07 2010
From: lizs at mail.uc.edu (Li, Zhisong (lizs))
Date: Wed, 1 Dec 2010 06:15:07 +0000
Subject: [petsc-users] What's the binary function corresponding to
 "PetscViewerASCIIPrintf"?
Message-ID: <88D7E3BB7E1960428303E7601003745142E57DB3@BL2PRD0103MB055.prod.exchangelabs.com>

 Hi, Petsc Team,


I once read that when we write large volume of data into an output data file, it's best to write in binary format rather than in ASCII format. I forget the origin of this statement and could not find it any more. What I found in the example code is about writing array into a binary file. Besides data stored in arrays, I also want to include some other info such as document title, data dimensions and line changing, which are not arrays, into the output file. So a "PetscViewerASCIIPrintf" is very convenient for doing this if we use ASCII format. But I could not find its correspondent function for binary format. Can you tell me how to do this? A sample code will be most helpful.


Best Regards,

Zhisong Li

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101201/4b6ee67a/attachment.htm>

From bsmith at mcs.anl.gov  Wed Dec  1 08:06:19 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 1 Dec 2010 08:06:19 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>
Message-ID: <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>


--with-64-bit-indices=1

  You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.

> MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
                                           ^^^^              ^^^^^

  --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example

PetscInt  mone
mone = 1
> MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)

but better just build PETSc without the --with-64-bit-indices

  Barry


On Nov 30, 2010, at 10:33 PM, Peter Wang wrote:

> I am trying to create a matrix and insert values to it. The martix is supposed to be as following:
>  
> 1   0    0   0
> 0   2    0   0
> 0   0    3   0
> 0   0    0   4
>  
> array  coef[] is the diagonal value of the matrix, 
>          snr[] is the index of the row, rnr[] is the index of column.
>  
> However, I always get the wrong results. It shows the Column too large: col 4607182418800017408 max 3!     I cheked the value of rnr[].  The output snr and rnr is correct:
> snr=   0   1   2   3
> rnr=   0   1   2   3
>  
> It seems there is something wrong when MatSetValues() is called. Following is a part of the error information. The information is shown at each loop of  do II=Istart,Iend-1
>  
> The output (if any) follows:
> snr=   0   1   2   3
> rnr=   0   1   2   3
> 8.....Check after MatGetOwnershipRange() Istart=      0  Iend=      4
> II=      0      1      0      0
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]PETSC ERROR: Argument out of range!
> [0]PETSC ERROR: Column too large: col 4607182418800017408 max 3!
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Debug_PETSc_MatCreate_20101130 on a linux-gnu named compute-1-35.hpc.local.uwm by pwang_a Tue Nov 30 22:27:03 2010
> [0]PETSC ERROR: Libraries linked from /sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1/lib
> [0]PETSC ERROR: Configure run at Fri Oct  8 12:59:16 2010
> [0]PETSC ERROR: Configure options --prefix=/sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1 --with-mpi-dir=/sharedapps/uwm/common/gcc-4.4.3/openmpi/1.3.2-v1 --with-blas-lapack-dir=/sharedapps/uwm/ceas/gcc-4.4.3/lapack/3.2.2-v1/lib --with-64-bit-indices=1 --with-64-bit-pointers=1 --with-large-file-io=1 --with-x=0
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: MatSetValues_SeqAIJ() line 193 in src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: MatSetValues() line 992 in src/mat/interface/matrix.c
> 
> 
>  
> !The code is as following:
> !=============================
> program Debug_PETSc_MatCreate_20101130
> implicit none
> !
> #include "finclude/petscsys.h"
> #include "finclude/petscvec.h"
> #include "finclude/petscmat.h"
> #include "finclude/petscpc.h"
> #include "finclude/petscksp.h"
> ! Variables
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> !  PETSc Variables
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
>       real*8  norm
>       PetscInt  i,j,II,JJ,its   !,m,n
>       PetscInt  Istart,Iend,ione
>       PetscErrorCode ierr
>       PetscMPIInt     myid,numprocs
>       PetscTruth  flg
>       PetscScalar v,one,neg_one
>       Vec         x,b,u
>       Mat         A_petsc 
>       KSP         ksp
>       PetscInt,parameter::    n_nz=4
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> !  Other Variables
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> !parameter::n_nz=4
> Real*8::Coef(n_nz)
> PetscInt::snr(n_nz),rnr(n_nz)
> data Coef /1.,   2.,    3. ,  4./
> data snr  /0,    1,     2,    3/
> data rnr  /0,    1 ,    2,    3/
> ! Body of Debug_PETSc_MatCreate_20101130
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                 Beginning of program
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
>     call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
>     call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
>     call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
>         write(*,"('snr=',4i4)")snr     
>         write(*,"('rnr=',4i4)")rnr     
>     call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
>     call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr)    !n_nz-1???
>     call MatSetFromOptions(A_Petsc,ierr)
>     !    write(*,*)A_petsc
>     call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
>     
>     write(*,'(1a,1i7,1a,1i7)')  &
>             '8.....Check after MatGetOwnershipRange() Istart=',Istart,'  Iend=',Iend
>     do II=Istart,Iend-1
>         ione=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
>             write(*,'(1a,4i7)')'II=',II,ione,snr(ione),rnr(ione)  !output snr and rnr for error check
>         call MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione),INSERT_VALUES,ierr)
>     enddo 
>     
>         write(*,'(1a)')'9.....Check after MatSetValues()'
>     call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
>     call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
>         write(*,'(1a)')'10.....Check after MatCreate()'
>     call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
> !      call KSPDestroy(ksp,ierr)
> !      call VecDestroy(u,ierr)
> !      call VecDestroy(x,ierr)
> !      call VecDestroy(b,ierr)
>     call MatDestroy(A_petsc,ierr)
>     call PetscFinalize(ierr)
> end program Debug_PETSc_MatCreate_20101130
> !=====================================


From bsmith at mcs.anl.gov  Wed Dec  1 08:12:27 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 1 Dec 2010 08:12:27 -0600
Subject: [petsc-users] What's the binary function corresponding to
	"PetscViewerASCIIPrintf"?
In-Reply-To: <88D7E3BB7E1960428303E7601003745142E57DB3@BL2PRD0103MB055.prod.exchangelabs.com>
References: <88D7E3BB7E1960428303E7601003745142E57DB3@BL2PRD0103MB055.prod.exchangelabs.com>
Message-ID: <AB5F3FA3-C52E-4B1B-8FC0-6A25AE295C6C@mcs.anl.gov>


  You can write raw strings into binary viewers with PetscViewerBinaryWrite(), BUT you will need to have your program that processes the data in the binary file be able to read out the strings. Another alternative is to use PetscViewerBinaryGetInfoPointer() and write your additional information into the corresponding .info ASCII file that is automatically generated. Again whatever program that processes your data will need to read that file to get the information out of it.

  Barry


On Dec 1, 2010, at 12:15 AM, Li, Zhisong (lizs) wrote:

>  Hi, Petsc Team,
> 
> 
> I once read that when we write large volume of data into an output data file, it's best to write in binary format rather than in ASCII format. I forget the origin of this statement and could not find it any more. What I found in the example code is about writing array into a binary file. Besides data stored in arrays, I also want to include some other info such as document title, data dimensions and line changing, which are not arrays, into the output file. So a "PetscViewerASCIIPrintf" is very convenient for doing this if we use ASCII format. But I could not find its correspondent function for binary format. Can you tell me how to do this? A sample code will be most helpful.
> 
> 
> Best Regards,
> 
> Zhisong Li
> 


From pengxwang at hotmail.com  Wed Dec  1 15:44:16 2010
From: pengxwang at hotmail.com (Peter Wang)
Date: Wed, 1 Dec 2010 15:44:16 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>,
	<3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>
Message-ID: <BAY128-W260745620B5228A268F1FB4260@phx.gbl>


Thanks,
 
 I changed the '1' to PetscInt ione, However, the error still comes out.
 
 do II=Istart,Iend-1
        mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
            write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
        call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)  ! PetscInt  ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz)
                                               ^^         ^^^
    enddo 

 
BTW, I am running the code on the clusters of supurcomputer. Where the option  ' --with-64-bit-indices=1' shold I find and remove? 
 
!===The modified code is ==
program Debug_PETSc_MatCreate_20101130
implicit none
!
#include "finclude/petscsys.h"
#include "finclude/petscvec.h"
#include "finclude/petscmat.h"
#include "finclude/petscpc.h"
#include "finclude/petscksp.h"
! Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
!  PETSc Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
      real*8  norm
      PetscInt  i,j,II,JJ,its   !,m,n
      PetscInt  Istart,Iend,ione,mone
      PetscErrorCode ierr
      PetscMPIInt     myid,numprocs
      PetscTruth  flg
      PetscScalar v,one,neg_one
      Vec         x,b,u
      Mat         A_petsc 
      KSP         ksp
      PetscInt,parameter::n_nz=4
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
!  Other Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
PetscInt::snr(n_nz),rnr(n_nz)
!parameter::n_nz=4
PetscReal::Coef(n_nz)
data Coef /1.,   2.,    3. ,  4./
data snr  /0,    1,     2,    3/
data rnr  /0,    1 ,    2,    3/
! Body of Debug_PETSc_MatCreate_20101130
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                 Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
    call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
    call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
        write(*,"('snr=',4i4)")snr     
        write(*,"('rnr=',4i4)")rnr     
    call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
    call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr)    !n_nz-1???
    call MatSetFromOptions(A_Petsc,ierr)
    !    write(*,*)A_petsc
    call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
    
    write(*,'(1a,1i7,1a,1i7)')  &
            '8.....Check after MatGetOwnershipRange() Istart=',Istart,'  Iend=',Iend
    do II=Istart,Iend-1
        mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
            write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
        call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)
    enddo 
    
        write(*,'(1a)')'9.....Check after MatSetValues()'
    call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
    call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
        write(*,'(1a)')'10.....Check after MatCreate()'
    call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
!      call KSPDestroy(ksp,ierr)
!      call VecDestroy(u,ierr)
!      call VecDestroy(x,ierr)
!      call VecDestroy(b,ierr)
    call MatDestroy(A_petsc,ierr)
    call PetscFinalize(ierr)
end program Debug_PETSc_MatCreate_20101130

 
> From: bsmith at mcs.anl.gov
> Date: Wed, 1 Dec 2010 08:06:19 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] column index in MatSetValues()
> 
> 
> 
> --with-64-bit-indices=1
> 
> You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.
> 
> > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
> ^^^^ ^^^^^
> 
> --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example
> 
> PetscInt mone
> mone = 1
> > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)
> 
> but better just build PETSc without the --with-64-bit-indices
> 
> Barry
> 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101201/746c3458/attachment.htm>

From bsmith at mcs.anl.gov  Wed Dec  1 15:48:47 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 1 Dec 2010 15:48:47 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <BAY128-W260745620B5228A268F1FB4260@phx.gbl>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>,
	<3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>
	<BAY128-W260745620B5228A268F1FB4260@phx.gbl>
Message-ID: <770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov>


  Humm, the problem is still very likely related to a miss-match between 4 byte and 8 byte integers.

   You should just install PETSc yourself (then you have control over it, giving control to someone else whenever doing scientific computing is always dangerous).

   Installing PETSc is usually no big deal. If you have problems send configure.log and make.log to petsc-maint at mcs.anl.gov


   Barry

On Dec 1, 2010, at 3:44 PM, Peter Wang wrote:

> Thanks,
>  
>  I changed the '1' to PetscInt ione, However, the error still comes out.
>  
>  do II=Istart,Iend-1
>         mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
>             write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
>         call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)  ! PetscInt  ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz)
>                                                ^^         ^^^
>     enddo 
> 
>  
> BTW, I am running the code on the clusters of supurcomputer. Where the option  ' --with-64-bit-indices=1' shold I find and remove? 
>  
> !===The modified code is ==
> program Debug_PETSc_MatCreate_20101130
> implicit none
> !
> #include "finclude/petscsys.h"
> #include "finclude/petscvec.h"
> #include "finclude/petscmat.h"
> #include "finclude/petscpc.h"
> #include "finclude/petscksp.h"
> ! Variables
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> !  PETSc Variables
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
>       real*8  norm
>       PetscInt  i,j,II,JJ,its   !,m,n
>       PetscInt  Istart,Iend,ione,mone
>       PetscErrorCode ierr
>       PetscMPIInt     myid,numprocs
>       PetscTruth  flg
>       PetscScalar v,one,neg_one
>       Vec         x,b,u
>       Mat         A_petsc 
>       KSP         ksp
>       PetscInt,parameter::n_nz=4
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> !  Other Variables
> !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> PetscInt::snr(n_nz),rnr(n_nz)
> !parameter::n_nz=4
> PetscReal::Coef(n_nz)
> data Coef /1.,   2.,    3. ,  4./
> data snr  /0,    1,     2,    3/
> data rnr  /0,    1 ,    2,    3/
> ! Body of Debug_PETSc_MatCreate_20101130
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                 Beginning of program
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
>     call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
>     call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
>     call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
>         write(*,"('snr=',4i4)")snr     
>         write(*,"('rnr=',4i4)")rnr     
>     call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
>     call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr)    !n_nz-1???
>     call MatSetFromOptions(A_Petsc,ierr)
>     !    write(*,*)A_petsc
>     call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
>     
>     write(*,'(1a,1i7,1a,1i7)')  &
>             '8.....Check after MatGetOwnershipRange() Istart=',Istart,'  Iend=',Iend
>     do II=Istart,Iend-1
>         mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
>             write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
>         call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)
>     enddo 
>     
>         write(*,'(1a)')'9.....Check after MatSetValues()'
>     call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
>     call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
>         write(*,'(1a)')'10.....Check after MatCreate()'
>     call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
> !      call KSPDestroy(ksp,ierr)
> !      call VecDestroy(u,ierr)
> !      call VecDestroy(x,ierr)
> !      call VecDestroy(b,ierr)
>     call MatDestroy(A_petsc,ierr)
>     call PetscFinalize(ierr)
> end program Debug_PETSc_MatCreate_20101130
> 
>  
> > From: bsmith at mcs.anl.gov
> > Date: Wed, 1 Dec 2010 08:06:19 -0600
> > To: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] column index in MatSetValues()
> > 
> > 
> > 
> > --with-64-bit-indices=1
> > 
> > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.
> > 
> > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
> > ^^^^ ^^^^^
> > 
> > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example
> > 
> > PetscInt mone
> > mone = 1
> > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)
> > 
> > but better just build PETSc without the --with-64-bit-indices
> > 
> > Barry
> > 
> > 


From pengxwang at hotmail.com  Thu Dec  2 11:20:36 2010
From: pengxwang at hotmail.com (Peter Wang)
Date: Thu, 2 Dec 2010 11:20:36 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <BAY128-W260745620B5228A268F1FB4260@phx.gbl>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>,
	<3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>,
	<BAY128-W260745620B5228A268F1FB4260@phx.gbl>
Message-ID: <BAY128-W84E0158F537671D6337ADB4270@phx.gbl>


Thanks,
 
 I changed the '1' to PetscInt ione, However, the error still comes out.
 
 do II=Istart,Iend-1
        mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
            write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
        call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)  ! PetscInt  ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz)
                                               ^^         ^^^
    enddo 

 
BTW, I am running the code on the clusters of supurcomputer. Where the option  ' --with-64-bit-indices=1' shold I find and remove? 
 
!===The modified code is ==
program Debug_PETSc_MatCreate_20101130
implicit none
!
#include "finclude/petscsys.h"
#include "finclude/petscvec.h"
#include "finclude/petscmat.h"
#include "finclude/petscpc.h"
#include "finclude/petscksp.h"
! Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
!  PETSc Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
      real*8  norm
      PetscInt  i,j,II,JJ,its   !,m,n
      PetscInt  Istart,Iend,ione,mone
      PetscErrorCode ierr
      PetscMPIInt     myid,numprocs
      PetscTruth  flg
      PetscScalar v,one,neg_one
      Vec         x,b,u
      Mat         A_petsc 
      KSP         ksp
      PetscInt,parameter::n_nz=4
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
!  Other Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
PetscInt::snr(n_nz),rnr(n_nz)
!parameter::n_nz=4
PetscReal::Coef(n_nz)
data Coef /1.,   2.,    3. ,  4./
data snr  /0,    1,     2,    3/
data rnr  /0,    1 ,    2,    3/
! Body of Debug_PETSc_MatCreate_20101130
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                 Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
    call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
    call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
        write(*,"('snr=',4i4)")snr     
        write(*,"('rnr=',4i4)")rnr     
    call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
    call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr)    !n_nz-1???
    call MatSetFromOptions(A_Petsc,ierr)
    !    write(*,*)A_petsc
    call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
    
    write(*,'(1a,1i7,1a,1i7)')  &
            '8.....Check after MatGetOwnershipRange() Istart=',Istart,'  Iend=',Iend
    do II=Istart,Iend-1
        mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
            write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
        call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)
    enddo 
    
        write(*,'(1a)')'9.....Check after MatSetValues()'
    call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
    call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
        write(*,'(1a)')'10.....Check after MatCreate()'
    call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
!      call KSPDestroy(ksp,ierr)
!      call VecDestroy(u,ierr)
!      call VecDestroy(x,ierr)
!      call VecDestroy(b,ierr)
    call MatDestroy(A_petsc,ierr)
    call PetscFinalize(ierr)
end program Debug_PETSc_MatCreate_20101130

 
> From: bsmith at mcs.anl.gov
> Date: Wed, 1 Dec 2010 08:06:19 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] column index in MatSetValues()
> 
> 
> 
> --with-64-bit-indices=1
> 
> You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.
> 
> > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
> ^^^^ ^^^^^
> 
> --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example
> 
> PetscInt mone
> mone = 1
> > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)
> 
> but better just build PETSc without the --with-64-bit-indices
> 
> Barry
> 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101202/52e74d14/attachment.htm>

From pengxwang at hotmail.com  Thu Dec  2 12:17:37 2010
From: pengxwang at hotmail.com (Peter Wang)
Date: Thu, 2 Dec 2010 12:17:37 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>,
	<3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>
Message-ID: <BAY128-W286CB040B1D87EAE7BDEA5B4270@phx.gbl>


  I sent two emails for replying thie topic. However, I didn't get the email of myself from petsc-users-bounces at mcs.anl.gov .  I am wondering if the email system has something wrong? Sorry if the resent email bohters anyone.
 
  In the new version of the code I defined PetscInt II,JJ,ione, mone,snr[] and rnr[], PetscReal coef[], and modified the following portion. However, the error is still there. Is there any ohter reason I didn't figure out?
 
 
 BTW, I am running the code on the clusters of supurcomputer. Where the option  ' --with-64-bit-indices=1' shold I find and remove? 
 
! ====the modified loop=======
    
  do I=Istart,Iend-1
        mone=I+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
        II=snr(mone)
        JJ=rnr(mone)
        v=coef(mone)
            write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
        call MatSetValues(A_Petsc,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
    enddo 

! ============the whole program code modified====================
program Debug_PETSc_MatCreate_20101130
implicit none
!
#include "finclude/petscsys.h"
#include "finclude/petscvec.h"
#include "finclude/petscmat.h"
#include "finclude/petscpc.h"
#include "finclude/petscksp.h"
! Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
!  PETSc Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
      real*8  norm
      PetscInt  i,j,II,JJ,its   !,m,n
      PetscInt  Istart,Iend,ione,mone
      PetscErrorCode ierr
      PetscMPIInt     myid,numprocs
      PetscTruth  flg
      PetscScalar v,one,neg_one
      Vec         x,b,u
      Mat         A_petsc 
      KSP         ksp
      PetscInt,parameter::n_nz=4
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
!  Other Variables
!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
PetscInt::snr(n_nz),rnr(n_nz)
!parameter::n_nz=4
PetscReal::Coef(n_nz)
data Coef /1.,   2.,    3. ,  4./
data snr  /0,    1,     2,    3/
data rnr  /0,    1 ,    2,    3/
! Body of Debug_PETSc_MatCreate_20101130
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                 Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
    call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
    call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
        write(*,"('snr=',4i4)")snr     
        write(*,"('rnr=',4i4)")rnr     
    call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
    call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr)    !n_nz-1???
    call MatSetFromOptions(A_Petsc,ierr)
    !    write(*,*)A_petsc
    call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
    
    write(*,'(1a,1i7,1a,1i7)')  &
            '8.....Check after MatGetOwnershipRange() Istart=',Istart,'  Iend=',Iend
    do I=Istart,Iend-1
        mone=I+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
        II=snr(mone)
        JJ=rnr(mone)
        v=coef(mone)
            write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
        call MatSetValues(A_Petsc,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
    enddo 
    
        write(*,'(1a)')'9.....Check after MatSetValues()'
    call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
    call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
        write(*,'(1a)')'10.....Check after MatCreate()'
    call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
!      call KSPDestroy(ksp,ierr)
!      call VecDestroy(u,ierr)
!      call VecDestroy(x,ierr)
!      call VecDestroy(b,ierr)
    call MatDestroy(A_petsc,ierr)
    call PetscFinalize(ierr)
end program Debug_PETSc_MatCreate_20101130
 
!===================End of the code============================

 
> From: bsmith at mcs.anl.gov
> Date: Wed, 1 Dec 2010 08:06:19 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] column index in MatSetValues()
> 
> 
> 
> --with-64-bit-indices=1
> 
> You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.
> 
> > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
> ^^^^ ^^^^^
> 
> --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example
> 
> PetscInt mone
> mone = 1
> > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)
> 
> but better just build PETSc without the --with-64-bit-indices
> 
> Barry
> 
> 
> On Nov 30, 2010, at 10:33 PM, Peter Wang wrote:
> 
> > I am trying to create a matrix and insert values to it. The martix is supposed to be as following:
> > 
> > 1 0 0 0
> > 0 2 0 0
> > 0 0 3 0
> > 0 0 0 4
> > 
> > array coef[] is the diagonal value of the matrix, 
> > snr[] is the index of the row, rnr[] is the index of column.
> > 
> > However, I always get the wrong results. It shows the Column too large: col 4607182418800017408 max 3! I cheked the value of rnr[]. The output snr and rnr is correct:
> > snr= 0 1 2 3
> > rnr= 0 1 2 3
> > 
> > It seems there is something wrong when MatSetValues() is called. Following is a part of the error information. The information is shown at each loop of do II=Istart,Iend-1
> > 
> > The output (if any) follows:
> > snr= 0 1 2 3
> > rnr= 0 1 2 3
> > 8.....Check after MatGetOwnershipRange() Istart= 0 Iend= 4
> > II= 0 1 0 0
> > [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> > [0]PETSC ERROR: Argument out of range!
> > [0]PETSC ERROR: Column too large: col 4607182418800017408 max 3!
> > [0]PETSC ERROR: ------------------------------------------------------------------------
> > [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
> > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [0]PETSC ERROR: See docs/index.html for manual pages.
> > [0]PETSC ERROR: ------------------------------------------------------------------------
> > [0]PETSC ERROR: Debug_PETSc_MatCreate_20101130 on a linux-gnu named compute-1-35.hpc.local.uwm by pwang_a Tue Nov 30 22:27:03 2010
> > [0]PETSC ERROR: Libraries linked from /sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1/lib
> > [0]PETSC ERROR: Configure run at Fri Oct 8 12:59:16 2010
> > [0]PETSC ERROR: Configure options --prefix=/sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1 --with-mpi-dir=/sharedapps/uwm/common/gcc-4.4.3/openmpi/1.3.2-v1 --with-blas-lapack-dir=/sharedapps/uwm/ceas/gcc-4.4.3/lapack/3.2.2-v1/lib --with-64-bit-indices=1 --with-64-bit-pointers=1 --with-large-file-io=1 --with-x=0
> > [0]PETSC ERROR: ------------------------------------------------------------------------
> > [0]PETSC ERROR: MatSetValues_SeqAIJ() line 193 in src/mat/impls/aij/seq/aij.c
> > [0]PETSC ERROR: MatSetValues() line 992 in src/mat/interface/matrix.c
> > 
> > 
> > 
> > !The code is as following:
> > !=============================
> > program Debug_PETSc_MatCreate_20101130
> > implicit none
> > !
> > #include "finclude/petscsys.h"
> > #include "finclude/petscvec.h"
> > #include "finclude/petscmat.h"
> > #include "finclude/petscpc.h"
> > #include "finclude/petscksp.h"
> > ! Variables
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > ! PETSc Variables
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > real*8 norm
> > PetscInt i,j,II,JJ,its !,m,n
> > PetscInt Istart,Iend,ione
> > PetscErrorCode ierr
> > PetscMPIInt myid,numprocs
> > PetscTruth flg
> > PetscScalar v,one,neg_one
> > Vec x,b,u
> > Mat A_petsc 
> > KSP ksp
> > PetscInt,parameter:: n_nz=4
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > ! Other Variables
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > !parameter::n_nz=4
> > Real*8::Coef(n_nz)
> > PetscInt::snr(n_nz),rnr(n_nz)
> > data Coef /1., 2., 3. , 4./
> > data snr /0, 1, 2, 3/
> > data rnr /0, 1 , 2, 3/
> > ! Body of Debug_PETSc_MatCreate_20101130
> > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > ! Beginning of program
> > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
> > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
> > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
> > write(*,"('snr=',4i4)")snr 
> > write(*,"('rnr=',4i4)")rnr 
> > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
> > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1???
> > call MatSetFromOptions(A_Petsc,ierr)
> > ! write(*,*)A_petsc
> > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
> > 
> > write(*,'(1a,1i7,1a,1i7)') &
> > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend
> > do II=Istart,Iend-1
> > ione=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
> > write(*,'(1a,4i7)')'II=',II,ione,snr(ione),rnr(ione) !output snr and rnr for error check
> > call MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione),INSERT_VALUES,ierr)
> > enddo 
> > 
> > write(*,'(1a)')'9.....Check after MatSetValues()'
> > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
> > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
> > write(*,'(1a)')'10.....Check after MatCreate()'
> > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
> > ! call KSPDestroy(ksp,ierr)
> > ! call VecDestroy(u,ierr)
> > ! call VecDestroy(x,ierr)
> > ! call VecDestroy(b,ierr)
> > call MatDestroy(A_petsc,ierr)
> > call PetscFinalize(ierr)
> > end program Debug_PETSc_MatCreate_20101130
> > !=====================================
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101202/581ad5af/attachment-0001.htm>

From pengxwang at hotmail.com  Thu Dec  2 14:02:00 2010
From: pengxwang at hotmail.com (Peter Wang)
Date: Thu, 2 Dec 2010 14:02:00 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>, ,
	<3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>,
	<BAY128-W260745620B5228A268F1FB4260@phx.gbl>,
	<770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov>
Message-ID: <BAY128-W166F443047976D9772B42CB4270@phx.gbl>


Thanks, Dr. Simth,
 
   If the PETSc can be installed on the supurcomputer by me?  The software is currently installed by the network manager of the supercomputer.
 
  The example code : ex2f.F from http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-2.3.3/src/ksp/ksp/examples/tutorials/ex2f.F is compiled and run on the same supercomputer. 
There is no error coming out when ex2f.F runs.   I am just trying to implement my own matrix into the code.  The variation in my code is that the indices of the matrix is arrays, while that in example code is II and JJ. However, in the latest version of my code, I already assigned the arrays of indices to PetscInt II,JJ. Unfortunately, the error still comes out.   It's kind of confusing that the own coded program just doen't work well.
 
  Thanks for your suggestion.
 
 
in> From: bsmith at mcs.anl.gov
> Date: Wed, 1 Dec 2010 15:48:47 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] column index in MatSetValues()
> 
> 
> Humm, the problem is still very likely related to a miss-match between 4 byte and 8 byte integers.
> 
> You should just install PETSc yourself (then you have control over it, giving control to someone else whenever doing scientific computing is always dangerous).
> 
> Installing PETSc is usually no big deal. If you have problems send configure.log and make.log to petsc-maint at mcs.anl.gov
> 
> 
> Barry
> 
> On Dec 1, 2010, at 3:44 PM, Peter Wang wrote:
> 
> > Thanks,
> > 
> > I changed the '1' to PetscInt ione, However, the error still comes out.
> > 
> > do II=Istart,Iend-1
> > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
> > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
> > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz)
> > ^^ ^^^
> > enddo 
> > 
> > 
> > BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? 
> > 
> > !===The modified code is ==
> > program Debug_PETSc_MatCreate_20101130
> > implicit none
> > !
> > #include "finclude/petscsys.h"
> > #include "finclude/petscvec.h"
> > #include "finclude/petscmat.h"
> > #include "finclude/petscpc.h"
> > #include "finclude/petscksp.h"
> > ! Variables
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > ! PETSc Variables
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > real*8 norm
> > PetscInt i,j,II,JJ,its !,m,n
> > PetscInt Istart,Iend,ione,mone
> > PetscErrorCode ierr
> > PetscMPIInt myid,numprocs
> > PetscTruth flg
> > PetscScalar v,one,neg_one
> > Vec x,b,u
> > Mat A_petsc 
> > KSP ksp
> > PetscInt,parameter::n_nz=4
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > ! Other Variables
> > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > PetscInt::snr(n_nz),rnr(n_nz)
> > !parameter::n_nz=4
> > PetscReal::Coef(n_nz)
> > data Coef /1., 2., 3. , 4./
> > data snr /0, 1, 2, 3/
> > data rnr /0, 1 , 2, 3/
> > ! Body of Debug_PETSc_MatCreate_20101130
> > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > ! Beginning of program
> > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
> > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
> > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
> > write(*,"('snr=',4i4)")snr 
> > write(*,"('rnr=',4i4)")rnr 
> > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
> > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1???
> > call MatSetFromOptions(A_Petsc,ierr)
> > ! write(*,*)A_petsc
> > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
> > 
> > write(*,'(1a,1i7,1a,1i7)') &
> > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend
> > do II=Istart,Iend-1
> > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
> > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
> > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)
> > enddo 
> > 
> > write(*,'(1a)')'9.....Check after MatSetValues()'
> > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
> > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
> > write(*,'(1a)')'10.....Check after MatCreate()'
> > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
> > ! call KSPDestroy(ksp,ierr)
> > ! call VecDestroy(u,ierr)
> > ! call VecDestroy(x,ierr)
> > ! call VecDestroy(b,ierr)
> > call MatDestroy(A_petsc,ierr)
> > call PetscFinalize(ierr)
> > end program Debug_PETSc_MatCreate_20101130
> > 
> > 
> > > From: bsmith at mcs.anl.gov
> > > Date: Wed, 1 Dec 2010 08:06:19 -0600
> > > To: petsc-users at mcs.anl.gov
> > > Subject: Re: [petsc-users] column index in MatSetValues()
> > > 
> > > 
> > > 
> > > --with-64-bit-indices=1
> > > 
> > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.
> > > 
> > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
> > > ^^^^ ^^^^^
> > > 
> > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example
> > > 
> > > PetscInt mone
> > > mone = 1
> > > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)
> > > 
> > > but better just build PETSc without the --with-64-bit-indices
> > > 
> > > Barry
> > > 
> > > 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101202/b2cb7517/attachment.htm>

From bsmith at mcs.anl.gov  Thu Dec  2 14:12:34 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 2 Dec 2010 14:12:34 -0600
Subject: [petsc-users] column index in MatSetValues()
In-Reply-To: <BAY128-W166F443047976D9772B42CB4270@phx.gbl>
References: <BAY128-W149C1E31452CAFC7DCA2FBB4260@phx.gbl>, ,
	<3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>,
	<BAY128-W260745620B5228A268F1FB4260@phx.gbl>,
	<770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov>
	<BAY128-W166F443047976D9772B42CB4270@phx.gbl>
Message-ID: <942B3EDF-32D8-42DE-94EF-9304EAD344E8@mcs.anl.gov>


On Dec 2, 2010, at 2:02 PM, Peter Wang wrote:

> Thanks, Dr. Simth,
>  
>    If the PETSc can be installed on the supurcomputer by me?  

   Yes, PETSc is just a library of source code. Anyone can install it,. http://www.mcs.anl.gov/petsc/petsc-as/documentation/installation.html

   Barry

> The software is currently installed by the network manager of the supercomputer.
>  
>   The example code : ex2f.F from http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-2.3.3/src/ksp/ksp/examples/tutorials/ex2f.F is compiled and run on the same supercomputer. 
> There is no error coming out when ex2f.F runs.   I am just trying to implement my own matrix into the code.  The variation in my code is that the indices of the matrix is arrays, while that in example code is II and JJ. However, in the latest version of my code, I already assigned the arrays of indices to PetscInt II,JJ. Unfortunately, the error still comes out.   It's kind of confusing that the own coded program just doen't work well.
>  
>   Thanks for your suggestion.
>  
>  
> in> From: bsmith at mcs.anl.gov
> > Date: Wed, 1 Dec 2010 15:48:47 -0600
> > To: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] column index in MatSetValues()
> > 
> > 
> > Humm, the problem is still very likely related to a miss-match between 4 byte and 8 byte integers.
> > 
> > You should just install PETSc yourself (then you have control over it, giving control to someone else whenever doing scientific computing is always dangerous).
> > 
> > Installing PETSc is usually no big deal. If you have problems send configure.log and make.log to petsc-maint at mcs.anl.gov
> > 
> > 
> > Barry
> > 
> > On Dec 1, 2010, at 3:44 PM, Peter Wang wrote:
> > 
> > > Thanks,
> > > 
> > > I changed the '1' to PetscInt ione, However, the error still comes out.
> > > 
> > > do II=Istart,Iend-1
> > > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
> > > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
> > > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz)
> > > ^^ ^^^
> > > enddo 
> > > 
> > > 
> > > BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? 
> > > 
> > > !===The modified code is ==
> > > program Debug_PETSc_MatCreate_20101130
> > > implicit none
> > > !
> > > #include "finclude/petscsys.h"
> > > #include "finclude/petscvec.h"
> > > #include "finclude/petscmat.h"
> > > #include "finclude/petscpc.h"
> > > #include "finclude/petscksp.h"
> > > ! Variables
> > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > > ! PETSc Variables
> > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > > real*8 norm
> > > PetscInt i,j,II,JJ,its !,m,n
> > > PetscInt Istart,Iend,ione,mone
> > > PetscErrorCode ierr
> > > PetscMPIInt myid,numprocs
> > > PetscTruth flg
> > > PetscScalar v,one,neg_one
> > > Vec x,b,u
> > > Mat A_petsc 
> > > KSP ksp
> > > PetscInt,parameter::n_nz=4
> > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > > ! Other Variables
> > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > > PetscInt::snr(n_nz),rnr(n_nz)
> > > !parameter::n_nz=4
> > > PetscReal::Coef(n_nz)
> > > data Coef /1., 2., 3. , 4./
> > > data snr /0, 1, 2, 3/
> > > data rnr /0, 1 , 2, 3/
> > > ! Body of Debug_PETSc_MatCreate_20101130
> > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > > ! Beginning of program
> > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
> > > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr)
> > > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr)
> > > write(*,"('snr=',4i4)")snr 
> > > write(*,"('rnr=',4i4)")rnr 
> > > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr)
> > > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1???
> > > call MatSetFromOptions(A_Petsc,ierr)
> > > ! write(*,*)A_petsc
> > > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr)
> > > 
> > > write(*,'(1a,1i7,1a,1i7)') &
> > > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend
> > > do II=Istart,Iend-1
> > > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based)
> > > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone)
> > > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr)
> > > enddo 
> > > 
> > > write(*,'(1a)')'9.....Check after MatSetValues()'
> > > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr)
> > > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr)
> > > write(*,'(1a)')'10.....Check after MatCreate()'
> > > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr)
> > > ! call KSPDestroy(ksp,ierr)
> > > ! call VecDestroy(u,ierr)
> > > ! call VecDestroy(x,ierr)
> > > ! call VecDestroy(b,ierr)
> > > call MatDestroy(A_petsc,ierr)
> > > call PetscFinalize(ierr)
> > > end program Debug_PETSc_MatCreate_20101130
> > > 
> > > 
> > > > From: bsmith at mcs.anl.gov
> > > > Date: Wed, 1 Dec 2010 08:06:19 -0600
> > > > To: petsc-users at mcs.anl.gov
> > > > Subject: Re: [petsc-users] column index in MatSetValues()
> > > > 
> > > > 
> > > > 
> > > > --with-64-bit-indices=1
> > > > 
> > > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly.
> > > > 
> > > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione)
> > > > ^^^^ ^^^^^
> > > > 
> > > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example
> > > > 
> > > > PetscInt mone
> > > > mone = 1
> > > > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione)
> > > > 
> > > > but better just build PETSc without the --with-64-bit-indices
> > > > 
> > > > Barry
> > > > 
> > > > 
> > 
> 


From vijay.m at gmail.com  Thu Dec  2 14:45:25 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 2 Dec 2010 14:45:25 -0600
Subject: [petsc-users] Shell matrix and MatMultAdd.
Message-ID: <AANLkTim2r8f1LaHXnGn9seGrzH0ZoSjB-O=u+SGSik8r@mail.gmail.com>

Hi all,

There is probably some minor inconsistency in my understanding but is
there a fundamental difference between the following two options ?

// Option 1
ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr);
ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr);
// Option 2
ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr);

Here A is a shell matrix (serial) that has a routine defined to
perform MATOP_MULT only.

I ask because Option 1 gives me the right result while Option 2
segfaults at the MatMultAdd line. My only logical conclusion is that I
need to define a MATOP_MULT_ADD or something similar for the shell for
this to work. Is this understanding correct ? I implicitly assumed
that petsc recognizes that MATOP_MULT has been defined already and
since MatMultAdd only requires the action of a matrix on a vector to
perform its operation, should this not be computed by petsc
automatically ? I really do not want to creat an extra vector here
with Option 1 since this occurs at a finer level in my calculation.
But is there any way that you would suggest I do this without extra
allocations ? Any comments or pointers will be much appreciated.

Vijay

From bsmith at mcs.anl.gov  Thu Dec  2 14:51:39 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 2 Dec 2010 14:51:39 -0600
Subject: [petsc-users] Shell matrix and MatMultAdd.
In-Reply-To: <AANLkTim2r8f1LaHXnGn9seGrzH0ZoSjB-O=u+SGSik8r@mail.gmail.com>
References: <AANLkTim2r8f1LaHXnGn9seGrzH0ZoSjB-O=u+SGSik8r@mail.gmail.com>
Message-ID: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov>


On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote:

> Hi all,
> 
> There is probably some minor inconsistency in my understanding but is
> there a fundamental difference between the following two options ?
> 
> // Option 1
> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr);
> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr);
> // Option 2
> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr);
> 
> Here A is a shell matrix (serial) that has a routine defined to
> perform MATOP_MULT only.
> 
> I ask because Option 1 gives me the right result while Option 2
> segfaults at the MatMultAdd line. My only logical conclusion is that I
> need to define a MATOP_MULT_ADD or something similar for the shell for
> this to work. Is this understanding correct ?

   Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix.

> I implicitly assumed
> that petsc recognizes that MATOP_MULT has been defined already and
> since MatMultAdd only requires the action of a matrix on a vector to
> perform its operation, should this not be computed by petsc
> automatically ?
 
   Sorry it doesn't though of course it could. 

> I really do not want to creat an extra vector here
> with Option 1 since this occurs at a finer level in my calculation.
> But is there any way that you would suggest I do this without extra
> allocations ? Any comments or pointers will be much appreciated.

   I suggest making a shell MatMultAdd_Mine() that does everything then 
also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll 
have one code to routine but can handle both operations.

   Barry


> 
> Vijay


From vijay.m at gmail.com  Thu Dec  2 15:02:12 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 2 Dec 2010 15:02:12 -0600
Subject: [petsc-users] Shell matrix and MatMultAdd.
In-Reply-To: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov>
References: <AANLkTim2r8f1LaHXnGn9seGrzH0ZoSjB-O=u+SGSik8r@mail.gmail.com>
	<435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov>
Message-ID: <AANLkTim3McmJxSJqC7QY0AJ1bnirJqXHHnfSvFSnANVx@mail.gmail.com>

> ? I suggest making a shell MatMultAdd_Mine() that does everything then
> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll
> have one code to routine but can handle both operations.

Barry, thanks for the suggestion. I will implement the MatMultAdd in
this way. Puristically speaking though, this does reverse the
dependency between the two routines !

Vijay

On Thu, Dec 2, 2010 at 2:51 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote:
>
>> Hi all,
>>
>> There is probably some minor inconsistency in my understanding but is
>> there a fundamental difference between the following two options ?
>>
>> // Option 1
>> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr);
>> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr);
>> // Option 2
>> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr);
>>
>> Here A is a shell matrix (serial) that has a routine defined to
>> perform MATOP_MULT only.
>>
>> I ask because Option 1 gives me the right result while Option 2
>> segfaults at the MatMultAdd line. My only logical conclusion is that I
>> need to define a MATOP_MULT_ADD or something similar for the shell for
>> this to work. Is this understanding correct ?
>
> ? Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix.
>
>> I implicitly assumed
>> that petsc recognizes that MATOP_MULT has been defined already and
>> since MatMultAdd only requires the action of a matrix on a vector to
>> perform its operation, should this not be computed by petsc
>> automatically ?
>
> ? Sorry it doesn't though of course it could.
>
>> I really do not want to creat an extra vector here
>> with Option 1 since this occurs at a finer level in my calculation.
>> But is there any way that you would suggest I do this without extra
>> allocations ? Any comments or pointers will be much appreciated.
>
> ? I suggest making a shell MatMultAdd_Mine() that does everything then
> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll
> have one code to routine but can handle both operations.
>
> ? Barry
>
>
>
>>
>> Vijay
>
>

From bsmith at mcs.anl.gov  Thu Dec  2 18:37:56 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 2 Dec 2010 18:37:56 -0600
Subject: [petsc-users] Shell matrix and MatMultAdd.
In-Reply-To: <AANLkTim3McmJxSJqC7QY0AJ1bnirJqXHHnfSvFSnANVx@mail.gmail.com>
References: <AANLkTim2r8f1LaHXnGn9seGrzH0ZoSjB-O=u+SGSik8r@mail.gmail.com>
	<435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov>
	<AANLkTim3McmJxSJqC7QY0AJ1bnirJqXHHnfSvFSnANVx@mail.gmail.com>
Message-ID: <03C42D84-A752-41D3-B60F-8FC4F30ADBD4@mcs.anl.gov>


On Dec 2, 2010, at 3:02 PM, Vijay S. Mahadevan wrote:

>>   I suggest making a shell MatMultAdd_Mine() that does everything then
>> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll
>> have one code to routine but can handle both operations.
> 
> Barry, thanks for the suggestion. I will implement the MatMultAdd in
> this way. Puristically speaking though, this does reverse the
> dependency between the two routines !

   Well one could argue that MatMult() is a special case of MatMultAdd() with a zero vector input.  

  Barry

> 
> Vijay
> 
> On Thu, Dec 2, 2010 at 2:51 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote:
>> 
>>> Hi all,
>>> 
>>> There is probably some minor inconsistency in my understanding but is
>>> there a fundamental difference between the following two options ?
>>> 
>>> // Option 1
>>> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr);
>>> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr);
>>> // Option 2
>>> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr);
>>> 
>>> Here A is a shell matrix (serial) that has a routine defined to
>>> perform MATOP_MULT only.
>>> 
>>> I ask because Option 1 gives me the right result while Option 2
>>> segfaults at the MatMultAdd line. My only logical conclusion is that I
>>> need to define a MATOP_MULT_ADD or something similar for the shell for
>>> this to work. Is this understanding correct ?
>> 
>>   Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix.
>> 
>>> I implicitly assumed
>>> that petsc recognizes that MATOP_MULT has been defined already and
>>> since MatMultAdd only requires the action of a matrix on a vector to
>>> perform its operation, should this not be computed by petsc
>>> automatically ?
>> 
>>   Sorry it doesn't though of course it could.
>> 
>>> I really do not want to creat an extra vector here
>>> with Option 1 since this occurs at a finer level in my calculation.
>>> But is there any way that you would suggest I do this without extra
>>> allocations ? Any comments or pointers will be much appreciated.
>> 
>>   I suggest making a shell MatMultAdd_Mine() that does everything then
>> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll
>> have one code to routine but can handle both operations.
>> 
>>   Barry
>> 
>> 
>> 
>>> 
>>> Vijay
>> 
>> 


From vijay.m at gmail.com  Thu Dec  2 22:46:09 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 2 Dec 2010 22:46:09 -0600
Subject: [petsc-users] Shell matrix and MatMultAdd.
In-Reply-To: <03C42D84-A752-41D3-B60F-8FC4F30ADBD4@mcs.anl.gov>
References: <AANLkTim2r8f1LaHXnGn9seGrzH0ZoSjB-O=u+SGSik8r@mail.gmail.com>
	<435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov>
	<AANLkTim3McmJxSJqC7QY0AJ1bnirJqXHHnfSvFSnANVx@mail.gmail.com>
	<03C42D84-A752-41D3-B60F-8FC4F30ADBD4@mcs.anl.gov>
Message-ID: <AANLkTinyC5L5sTY5M3dU5Jwi9xbFtXsTw3wR4bQSnSQa@mail.gmail.com>

>   Well one could argue that MatMult() is a special case of MatMultAdd() with a zero vector input.

This is true and can be done but just that I truly believe in the
philosophy of building higher functions based on atomistic actions.
Here, MatMultAdd is just a composite of MatMult and a VecAXPY, two
basic operations. From a design stand-point, it is just confusing for
me to look at it as a top-down scenario. Just my 2 cents.

Again, from my implementation stand-point, I am going to proceed as
you suggested because I just want to get my code working for now...

Vijay

On Thu, Dec 2, 2010 at 6:37 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Dec 2, 2010, at 3:02 PM, Vijay S. Mahadevan wrote:
>
>>> ? I suggest making a shell MatMultAdd_Mine() that does everything then
>>> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll
>>> have one code to routine but can handle both operations.
>>
>> Barry, thanks for the suggestion. I will implement the MatMultAdd in
>> this way. Puristically speaking though, this does reverse the
>> dependency between the two routines !
>
> ? Well one could argue that MatMult() is a special case of MatMultAdd() with a zero vector input.
>
> ?Barry
>
>>
>> Vijay
>>
>> On Thu, Dec 2, 2010 at 2:51 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Hi all,
>>>>
>>>> There is probably some minor inconsistency in my understanding but is
>>>> there a fundamental difference between the following two options ?
>>>>
>>>> // Option 1
>>>> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr);
>>>> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr);
>>>> // Option 2
>>>> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr);
>>>>
>>>> Here A is a shell matrix (serial) that has a routine defined to
>>>> perform MATOP_MULT only.
>>>>
>>>> I ask because Option 1 gives me the right result while Option 2
>>>> segfaults at the MatMultAdd line. My only logical conclusion is that I
>>>> need to define a MATOP_MULT_ADD or something similar for the shell for
>>>> this to work. Is this understanding correct ?
>>>
>>> ? Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix.
>>>
>>>> I implicitly assumed
>>>> that petsc recognizes that MATOP_MULT has been defined already and
>>>> since MatMultAdd only requires the action of a matrix on a vector to
>>>> perform its operation, should this not be computed by petsc
>>>> automatically ?
>>>
>>> ? Sorry it doesn't though of course it could.
>>>
>>>> I really do not want to creat an extra vector here
>>>> with Option 1 since this occurs at a finer level in my calculation.
>>>> But is there any way that you would suggest I do this without extra
>>>> allocations ? Any comments or pointers will be much appreciated.
>>>
>>> ? I suggest making a shell MatMultAdd_Mine() that does everything then
>>> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll
>>> have one code to routine but can handle both operations.
>>>
>>> ? Barry
>>>
>>>
>>>
>>>>
>>>> Vijay
>>>
>>>
>
>

From vijay.m at gmail.com  Thu Dec  2 23:02:44 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 2 Dec 2010 23:02:44 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
Message-ID: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>

Hi all,

I was wondering whether the MG preconditioner object is generic enough
to work out of the box like say ILU or SOR.  To elaborate on this, if
I can provide the number of levels, restriction and prolongation
operators for each level and the system operators along with vectors
allocated for solution and rhs, would it work as a preconditioner for
my given problem and a prescribed rhs at the finest level of PCMG. Or
does it need some knowledge of the fine and coarser meshes to perform
the MG operations correctly ?

All the examples I've seen using MG in petsc involve the DA and DMMG
objects and since I use my own mesh and corresponding discretization
code for an elliptic system, I'm curious about this usage. It would
not be terribly difficult to write my own framework to do a simple
V-cycle with my existing framework but since petsc already provides
this functionality along with different types of MG solves (with
verified code!), I really want to use it for my system. Any help
and/or pointers are welcome.

Thanks,
vijay

From jed at 59A2.org  Fri Dec  3 03:43:22 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 10:43:22 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
Message-ID: <AANLkTimcPSaZOG1s=D9nDoSs2s=3OpUH1Oog2F2_Zm_q@mail.gmail.com>

On Fri, Dec 3, 2010 at 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> I was wondering whether the MG preconditioner object is generic enough
> to work out of the box like say ILU or SOR.  To elaborate on this, if
> I can provide the number of levels, restriction and prolongation
> operators for each level and the system operators along with vectors
> allocated for solution and rhs, would it work as a preconditioner for
> my given problem and a prescribed rhs at the finest level of PCMG. Or
> does it need some knowledge of the fine and coarser meshes to perform
> the MG operations correctly ?
>

PCMG is purely algebraic so it does not need knowledge about the mesh, just
interpolation/restriction.  However, if you want to use non-Galerkin coarse
operators, then you will need a coarse mesh (this is something that DMMG
facilitates).  Look at section 4.4.7 of the users manual for explanation of
using PCMG.  There don't seem to be any well-documented examples using PCMG
directly, but you might want to look at src/ksp/examples/tests/ex19.c or
src/snes/examples/tests/ex11.c.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/b269f33c/attachment.htm>

From dave.mayhem23 at gmail.com  Fri Dec  3 03:44:49 2010
From: dave.mayhem23 at gmail.com (Dave May)
Date: Fri, 3 Dec 2010 10:44:49 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
Message-ID: <AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>

Hey Vijay,
  PCMG is generic. If you provide the operators for each level, along
with the restriction and prolongation,
you can use PCMG. It doesn't need to know about the mesh.

You don't actually need to provide the coarse grid operators.
Given the fine grid operator and R and optionally P, you can use
Galerkin coarsening by calling
PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin
Also, if you don't specify the prolongation, petsc will use P = R^T.


Cheers,
  Dave


On 3 December 2010 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
> Hi all,
>
> I was wondering whether the MG preconditioner object is generic enough
> to work out of the box like say ILU or SOR. ?To elaborate on this, if
> I can provide the number of levels, restriction and prolongation
> operators for each level and the system operators along with vectors
> allocated for solution and rhs, would it work as a preconditioner for
> my given problem and a prescribed rhs at the finest level of PCMG. Or
> does it need some knowledge of the fine and coarser meshes to perform
> the MG operations correctly ?
>
> All the examples I've seen using MG in petsc involve the DA and DMMG
> objects and since I use my own mesh and corresponding discretization
> code for an elliptic system, I'm curious about this usage. It would
> not be terribly difficult to write my own framework to do a simple
> V-cycle with my existing framework but since petsc already provides
> this functionality along with different types of MG solves (with
> verified code!), I really want to use it for my system. Any help
> and/or pointers are welcome.
>
> Thanks,
> vijay
>

From knepley at gmail.com  Fri Dec  3 11:02:45 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 3 Dec 2010 11:02:45 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
Message-ID: <AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>

I will also note that a good intro for implementing your own might be the ML
PC
in Petsc. It puts the ML AMG package into the PCMG framework.

   Matt

On Fri, Dec 3, 2010 at 3:44 AM, Dave May <dave.mayhem23 at gmail.com> wrote:

> Hey Vijay,
>  PCMG is generic. If you provide the operators for each level, along
> with the restriction and prolongation,
> you can use PCMG. It doesn't need to know about the mesh.
>
> You don't actually need to provide the coarse grid operators.
> Given the fine grid operator and R and optionally P, you can use
> Galerkin coarsening by calling
> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin
> Also, if you don't specify the prolongation, petsc will use P = R^T.
>
>
> Cheers,
>   Dave
>
>
> On 3 December 2010 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
> > Hi all,
> >
> > I was wondering whether the MG preconditioner object is generic enough
> > to work out of the box like say ILU or SOR.  To elaborate on this, if
> > I can provide the number of levels, restriction and prolongation
> > operators for each level and the system operators along with vectors
> > allocated for solution and rhs, would it work as a preconditioner for
> > my given problem and a prescribed rhs at the finest level of PCMG. Or
> > does it need some knowledge of the fine and coarser meshes to perform
> > the MG operations correctly ?
> >
> > All the examples I've seen using MG in petsc involve the DA and DMMG
> > objects and since I use my own mesh and corresponding discretization
> > code for an elliptic system, I'm curious about this usage. It would
> > not be terribly difficult to write my own framework to do a simple
> > V-cycle with my existing framework but since petsc already provides
> > this functionality along with different types of MG solves (with
> > verified code!), I really want to use it for my system. Any help
> > and/or pointers are welcome.
> >
> > Thanks,
> > vijay
> >
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/2d5149b0/attachment.htm>

From vijay.m at gmail.com  Fri Dec  3 12:29:40 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Fri, 3 Dec 2010 12:29:40 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
Message-ID: <AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>

Jed and Dave, thanks for the explanation. Now I do understand that the
MG PC is generic if I have restriction/prolongation operators for
every level. But I do not have the fine grid operator on hand
explicitly (only a shell matrix with MatMult) and technically all my
coarser grid operators will also be matrix-free. I was originally
planning to hand these shell matrices to petsc as the coarse
operators but can petsc do this by itself with only access to the fine
grid operator ?

Basically my problem is that I cannot afford to create a matrix until
the coarsest level and I plan to use HYPRE BoomerAMG on the coarsest
level. With this hierarchy, would MatMult enabled shell matrices
provide enough leeway to get the optimal performance from MG ?

Matt, I will look into the ML framework also. Thanks.

Vijay

On Fri, Dec 3, 2010 at 11:02 AM, Matthew Knepley <knepley at gmail.com> wrote:
> I will also note that a good intro for implementing your own might be the ML
> PC
> in Petsc. It puts the ML AMG package into the PCMG framework.
> ?? Matt
>
> On Fri, Dec 3, 2010 at 3:44 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>>
>> Hey Vijay,
>> ?PCMG is generic. If you provide the operators for each level, along
>> with the restriction and prolongation,
>> you can use PCMG. It doesn't need to know about the mesh.
>>
>> You don't actually need to provide the coarse grid operators.
>> Given the fine grid operator and R and optionally P, you can use
>> Galerkin coarsening by calling
>> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin
>> Also, if you don't specify the prolongation, petsc will use P = R^T.
>>
>>
>> Cheers,
>> ?Dave
>>
>>
>> On 3 December 2010 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>> > Hi all,
>> >
>> > I was wondering whether the MG preconditioner object is generic enough
>> > to work out of the box like say ILU or SOR. ?To elaborate on this, if
>> > I can provide the number of levels, restriction and prolongation
>> > operators for each level and the system operators along with vectors
>> > allocated for solution and rhs, would it work as a preconditioner for
>> > my given problem and a prescribed rhs at the finest level of PCMG. Or
>> > does it need some knowledge of the fine and coarser meshes to perform
>> > the MG operations correctly ?
>> >
>> > All the examples I've seen using MG in petsc involve the DA and DMMG
>> > objects and since I use my own mesh and corresponding discretization
>> > code for an elliptic system, I'm curious about this usage. It would
>> > not be terribly difficult to write my own framework to do a simple
>> > V-cycle with my existing framework but since petsc already provides
>> > this functionality along with different types of MG solves (with
>> > verified code!), I really want to use it for my system. Any help
>> > and/or pointers are welcome.
>> >
>> > Thanks,
>> > vijay
>> >
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>

From rlmackie862 at gmail.com  Fri Dec  3 12:33:39 2010
From: rlmackie862 at gmail.com (Randall Mackie)
Date: Fri, 3 Dec 2010 10:33:39 -0800
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
Message-ID: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>

Are there any examples that show how to use the ML PC?

Randy


On Dec 3, 2010, at 9:02 AM, Matthew Knepley wrote:

> I will also note that a good intro for implementing your own might be the ML PC
> in Petsc. It puts the ML AMG package into the PCMG framework.
> 
>    Matt
> 
> On Fri, Dec 3, 2010 at 3:44 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
> Hey Vijay,
>  PCMG is generic. If you provide the operators for each level, along
> with the restriction and prolongation,
> you can use PCMG. It doesn't need to know about the mesh.
> 
> You don't actually need to provide the coarse grid operators.
> Given the fine grid operator and R and optionally P, you can use
> Galerkin coarsening by calling
> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin
> Also, if you don't specify the prolongation, petsc will use P = R^T.
> 
> 
> Cheers,
>  Dave
> 
> 
> On 3 December 2010 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
> > Hi all,
> >
> > I was wondering whether the MG preconditioner object is generic enough
> > to work out of the box like say ILU or SOR.  To elaborate on this, if
> > I can provide the number of levels, restriction and prolongation
> > operators for each level and the system operators along with vectors
> > allocated for solution and rhs, would it work as a preconditioner for
> > my given problem and a prescribed rhs at the finest level of PCMG. Or
> > does it need some knowledge of the fine and coarser meshes to perform
> > the MG operations correctly ?
> >
> > All the examples I've seen using MG in petsc involve the DA and DMMG
> > objects and since I use my own mesh and corresponding discretization
> > code for an elliptic system, I'm curious about this usage. It would
> > not be terribly difficult to write my own framework to do a simple
> > V-cycle with my existing framework but since petsc already provides
> > this functionality along with different types of MG solves (with
> > verified code!), I really want to use it for my system. Any help
> > and/or pointers are welcome.
> >
> > Thanks,
> > vijay
> >
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/7d462808/attachment.htm>

From jed at 59A2.org  Fri Dec  3 12:34:21 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 19:34:21 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
Message-ID: <AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>

On Fri, Dec 3, 2010 at 19:29, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> Jed and Dave, thanks for the explanation. Now I do understand that the
> MG PC is generic if I have restriction/prolongation operators for
> every level. But I do not have the fine grid operator on hand
> explicitly (only a shell matrix with MatMult) and technically all my
> coarser grid operators will also be matrix-free. I was originally
> planning to hand these shell matrices to petsc as the coarse
> operators but can petsc do this by itself with only access to the fine
> grid operator ?
>

PETSc cannot magically create intermediate-level shell operators.  If you
want to do everything matrix-free, then you have to provide everything that
needs to be "smart": restriction/interpolation, smoothers, and residuals.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/7dca7977/attachment.htm>

From jed at 59A2.org  Fri Dec  3 12:36:32 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 19:36:32 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>
Message-ID: <AANLkTikuAB-n4-DvtOGKh87ouUOT+Y5ZtxSQpPnuNZwU@mail.gmail.com>

On Fri, Dec 3, 2010 at 19:33, Randall Mackie <rlmackie862 at gmail.com> wrote:

> Are there any examples that show how to use the ML PC?


Run any example that uses assembled matrices (almost all of them) with
-pc_type ml.

It exposes the full multigrid hierarchy so all the usual options for PCMG
work.  ML-specific options are only available through the options database
(we don't have interface functions for all of them).  Run with

  -pc_type ml -help | grep pc_ml_

to see them.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/8e10e95c/attachment.htm>

From knepley at gmail.com  Fri Dec  3 12:36:54 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 3 Dec 2010 12:36:54 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>
Message-ID: <AANLkTi=tZpsRqMTaA3sr47xO9uSobS0GYy_RVNo4E-aT@mail.gmail.com>

On Fri, Dec 3, 2010 at 12:33 PM, Randall Mackie <rlmackie862 at gmail.com>wrote:

> Are there any examples that show how to use the ML PC?
>

-pc_type ml

It is Algebraic Multigrid.

   Matt


> Randy
>
>
> On Dec 3, 2010, at 9:02 AM, Matthew Knepley wrote:
>
> I will also note that a good intro for implementing your own might be the
> ML PC
> in Petsc. It puts the ML AMG package into the PCMG framework.
>
>    Matt
>
> On Fri, Dec 3, 2010 at 3:44 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>> Hey Vijay,
>>  PCMG is generic. If you provide the operators for each level, along
>> with the restriction and prolongation,
>> you can use PCMG. It doesn't need to know about the mesh.
>>
>> You don't actually need to provide the coarse grid operators.
>> Given the fine grid operator and R and optionally P, you can use
>> Galerkin coarsening by calling
>> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin
>> Also, if you don't specify the prolongation, petsc will use P = R^T.
>>
>>
>> Cheers,
>>   Dave
>>
>>
>> On 3 December 2010 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>> > Hi all,
>> >
>> > I was wondering whether the MG preconditioner object is generic enough
>> > to work out of the box like say ILU or SOR.  To elaborate on this, if
>> > I can provide the number of levels, restriction and prolongation
>> > operators for each level and the system operators along with vectors
>> > allocated for solution and rhs, would it work as a preconditioner for
>> > my given problem and a prescribed rhs at the finest level of PCMG. Or
>> > does it need some knowledge of the fine and coarser meshes to perform
>> > the MG operations correctly ?
>> >
>> > All the examples I've seen using MG in petsc involve the DA and DMMG
>> > objects and since I use my own mesh and corresponding discretization
>> > code for an elliptic system, I'm curious about this usage. It would
>> > not be terribly difficult to write my own framework to do a simple
>> > V-cycle with my existing framework but since petsc already provides
>> > this functionality along with different types of MG solves (with
>> > verified code!), I really want to use it for my system. Any help
>> > and/or pointers are welcome.
>> >
>> > Thanks,
>> > vijay
>> >
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/bfeb03b6/attachment.htm>

From vijay.m at gmail.com  Fri Dec  3 12:43:15 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Fri, 3 Dec 2010 12:43:15 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTi=tZpsRqMTaA3sr47xO9uSobS0GYy_RVNo4E-aT@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>
	<AANLkTi=tZpsRqMTaA3sr47xO9uSobS0GYy_RVNo4E-aT@mail.gmail.com>
Message-ID: <AANLkTi=HPUKAVDnLsZgqf7_EnjwUu54O35nf+rD4fW=u@mail.gmail.com>

Matt, I have used the Hypre AMG option in the past but have not tried
ML AMG before. Are there any added advantages in terms of
performance/memory footprint and such between the two ?

Vijay

On Fri, Dec 3, 2010 at 12:36 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Dec 3, 2010 at 12:33 PM, Randall Mackie <rlmackie862 at gmail.com>
> wrote:
>>
>> Are there any examples that show how to use the ML PC?
>
> -pc_type ml
> It is Algebraic Multigrid.
> ?? Matt
>
>>
>> Randy
>>
>> On Dec 3, 2010, at 9:02 AM, Matthew Knepley wrote:
>>
>> I will also note that a good intro for implementing your own might be the
>> ML PC
>> in Petsc. It puts the ML AMG package into the PCMG framework.
>> ?? Matt
>>
>> On Fri, Dec 3, 2010 at 3:44 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>>>
>>> Hey Vijay,
>>> ?PCMG is generic. If you provide the operators for each level, along
>>> with the restriction and prolongation,
>>> you can use PCMG. It doesn't need to know about the mesh.
>>>
>>> You don't actually need to provide the coarse grid operators.
>>> Given the fine grid operator and R and optionally P, you can use
>>> Galerkin coarsening by calling
>>> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin
>>> Also, if you don't specify the prolongation, petsc will use P = R^T.
>>>
>>>
>>> Cheers,
>>> ?Dave
>>>
>>>
>>> On 3 December 2010 06:02, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>> > Hi all,
>>> >
>>> > I was wondering whether the MG preconditioner object is generic enough
>>> > to work out of the box like say ILU or SOR. ?To elaborate on this, if
>>> > I can provide the number of levels, restriction and prolongation
>>> > operators for each level and the system operators along with vectors
>>> > allocated for solution and rhs, would it work as a preconditioner for
>>> > my given problem and a prescribed rhs at the finest level of PCMG. Or
>>> > does it need some knowledge of the fine and coarser meshes to perform
>>> > the MG operations correctly ?
>>> >
>>> > All the examples I've seen using MG in petsc involve the DA and DMMG
>>> > objects and since I use my own mesh and corresponding discretization
>>> > code for an elliptic system, I'm curious about this usage. It would
>>> > not be terribly difficult to write my own framework to do a simple
>>> > V-cycle with my existing framework but since petsc already provides
>>> > this functionality along with different types of MG solves (with
>>> > verified code!), I really want to use it for my system. Any help
>>> > and/or pointers are welcome.
>>> >
>>> > Thanks,
>>> > vijay
>>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>

From jed at 59A2.org  Fri Dec  3 12:47:57 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 19:47:57 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTi=HPUKAVDnLsZgqf7_EnjwUu54O35nf+rD4fW=u@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com>
	<AANLkTi=tZpsRqMTaA3sr47xO9uSobS0GYy_RVNo4E-aT@mail.gmail.com>
	<AANLkTi=HPUKAVDnLsZgqf7_EnjwUu54O35nf+rD4fW=u@mail.gmail.com>
Message-ID: <AANLkTindz1BSxFjj5eAViOrM_ADFY1JUpi6pY-e6-hgv@mail.gmail.com>

On Fri, Dec 3, 2010 at 19:43, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> Are there any added advantages in terms of
> performance/memory footprint and such between the two ?
>

Generally,

ML takes less memory and it returns coarse level operators to PETSc so you
have a lot more flexibility (you can use all of PETSc's preconditioners as
smoothers, you can control each level independently, and you have lots of
options to solve the coarse-level problem).  ML needs fewer levels and has
lower setup costs.  BoomerAMG usually produces a more robust hierarchy so it
works for some problems that ML does not.  It is basically a black box so
you only have the flexibility that they specifically provided (rather than
everything in PETSc plus whatever you might want to do).

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/5b63ff73/attachment-0001.htm>

From vijay.m at gmail.com  Fri Dec  3 13:20:54 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Fri, 3 Dec 2010 13:20:54 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
Message-ID: <AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>

> PETSc cannot magically create intermediate-level shell operators.  If you
> want to do everything matrix-free, then you have to provide everything that
> needs to be "smart": restriction/interpolation, smoothers, and residuals.

This is fine and I am more than happy to hand these to Petsc. I do
have the restriction/prolongation matrices explicitly but only the
operators are matrix-free for now. So if I start off with a
matrix-free PC operator, how exactly do I provide the shell matrix
operators for all the other levels ?

I do not see any routine to enable this and my conclusion is that
petsc does find it using the fine grid operator at hand using R^T*A*R
transformation. If this is the case for all levels, then it avalanches
into a lot of fine-grid matrix vector products. I hope this is not the
way it is done.

My other line of thinking is to directly manipulate the KSP/PC
operators at every level and replace them with the correct shell
matrices.

But I am not sure what is the recommended procedure here. All
comments/suggestions welcome.

Vijay

On Fri, Dec 3, 2010 at 12:34 PM, Jed Brown <jed at 59a2.org> wrote:
> On Fri, Dec 3, 2010 at 19:29, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> Jed and Dave, thanks for the explanation. Now I do understand that the
>> MG PC is generic if I have restriction/prolongation operators for
>> every level. But I do not have the fine grid operator on hand
>> explicitly (only a shell matrix with MatMult) and technically all my
>> coarser grid operators will also be matrix-free. I was originally
>> planning to hand these shell matrices to petsc as the coarse
>> operators but can petsc do this by itself with only access to the fine
>> grid operator ?
>
> PETSc cannot magically create intermediate-level shell operators. ?If you
> want to do everything matrix-free, then you have to provide everything that
> needs to be "smart": restriction/interpolation, smoothers, and residuals.
> Jed

From jed at 59A2.org  Fri Dec  3 13:27:01 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 20:27:01 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
Message-ID: <AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>

On Fri, Dec 3, 2010 at 20:20, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> This is fine and I am more than happy to hand these to Petsc. I do
> have the restriction/prolongation matrices explicitly but only the
> operators are matrix-free for now. So if I start off with a
> matrix-free PC operator, how exactly do I provide the shell matrix
> operators for all the other levels ?
>

PCMGSetResidual, PCMGGetSmoother and set it to whatever type you want (maybe
your custom one).


> I do not see any routine to enable this and my conclusion is that
> petsc does find it using the fine grid operator at hand using R^T*A*R
> transformation.
>

Galerkin coarse operators really need to be formed (not applied as a
product) to make sense.  PCMG calls MatPtAP(), Galerkin coarse operators
will not work unless this is implemented.  You want to provide the matrix
yourself.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/187882f6/attachment.htm>

From vijay.m at gmail.com  Fri Dec  3 13:53:32 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Fri, 3 Dec 2010 13:53:32 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
Message-ID: <AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>

> Galerkin coarse operators really need to be formed (not applied as a
> product) to make sense.  PCMG calls MatPtAP(), Galerkin coarse operators
> will not work unless this is implemented.  You want to provide the matrix
> yourself.

So this does require the fine grid matrix to be formed explicitly and
handed over to the top level KSP solver ? Since I start with a shell
matrix and can only provide action of the fine operator on a vector,
this does pose a problem. I will implement based on the shell matrix
for now and see at which point the requirements are stopping me.
Thanks for all the help Jed. I will post here again if I have more
questions.

Vijay

On Fri, Dec 3, 2010 at 1:27 PM, Jed Brown <jed at 59a2.org> wrote:
> On Fri, Dec 3, 2010 at 20:20, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> This is fine and I am more than happy to hand these to Petsc. I do
>> have the restriction/prolongation matrices explicitly but only the
>> operators are matrix-free for now. So if I start off with a
>> matrix-free PC operator, how exactly do I provide the shell matrix
>> operators for all the other levels ?
>
> PCMGSetResidual, PCMGGetSmoother and set it to whatever type you want (maybe
> your custom one).
>
>>
>> I do not see any routine to enable this and my conclusion is that
>> petsc does find it using the fine grid operator at hand using R^T*A*R
>> transformation.
>
> Galerkin coarse operators really need to be formed (not applied as a
> product) to make sense. ?PCMG calls MatPtAP(), Galerkin coarse operators
> will not work unless this is implemented. ?You want to provide the matrix
> yourself.
> Jed

From jed at 59A2.org  Fri Dec  3 13:56:20 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 20:56:20 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
Message-ID: <AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>

On Fri, Dec 3, 2010 at 20:53, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> So this does require the fine grid matrix to be formed explicitly and
> handed over to the top level KSP solver ?
>

If you want to use Galerkin coarse operators, then you have to assemble the
fine-grid matrix.  This is an algorithmic issue, not a matter of PETSc's API
or something like that.  If you can provide matrix-free residuals/smoothers,
then you don't need to use the Galerkin procedure to build coarse operators.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/1b3e1288/attachment.htm>

From vijay.m at gmail.com  Fri Dec  3 14:09:01 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Fri, 3 Dec 2010 14:09:01 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
Message-ID: <AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>

> If you want to use Galerkin coarse operators, then you have to assemble the
> fine-grid matrix.  This is an algorithmic issue, not a matter of PETSc's API
> or something like that.  If you can provide matrix-free residuals/smoothers,
> then you don't need to use the Galerkin procedure to build coarse operators.

Ah, I misunderstood your explanation earlier. If I do provide the
restriction/prolongation along with a fine-grid shell matrix and opt
to not use Galerkin MG, then how do I provide the coarse grid
operators to petsc ? I also just remembered from one of your earlier
posts that you mentioned the use of non-Galerkin coarse operators
requires a coarse mesh to be provided. Since my code does not use DMMG
at all but is rather based on an unstructured grid setting using
libMesh, I do not know how to proceed here.

And I dont quite get what a matrix-free residual is.. Wouldn?t
PCMGDefaultResidual compute the residual with just MatMult operation
defined (b-Ax) for every level ? Why do I need a custom residual
operator ?

On Fri, Dec 3, 2010 at 1:56 PM, Jed Brown <jed at 59a2.org> wrote:
> On Fri, Dec 3, 2010 at 20:53, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> So this does require the fine grid matrix to be formed explicitly and
>> handed over to the top level KSP solver ?
>
> If you want to use Galerkin coarse operators, then you have to assemble the
> fine-grid matrix. ?This is an algorithmic issue, not a matter of PETSc's API
> or something like that. ?If you can provide matrix-free residuals/smoothers,
> then you don't need to use the Galerkin procedure to build coarse operators.
> Jed

From jed at 59A2.org  Fri Dec  3 14:16:01 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 21:16:01 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
	<AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
Message-ID: <AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>

On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> Ah, I misunderstood your explanation earlier. If I do provide the
> restriction/prolongation along with a fine-grid shell matrix and opt
> to not use Galerkin MG, then how do I provide the coarse grid
> operators to petsc?
>

PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators().


> I also just remembered from one of your earlier
> posts that you mentioned the use of non-Galerkin coarse operators
> requires a coarse mesh to be provided.
>

No, this is not required. PCMG's interface is purely algebraic, you do not
need to use DMMG or otherwise provide a "mesh".  You have to provide
coarse-level operators (as described above).  This is all in the users
manual.


> And I dont quite get what a matrix-free residual is.. Wouldn?t
> PCMGDefaultResidual compute the residual with just MatMult operation
> defined (b-Ax) for every level ? Why do I need a custom residual
> operator ?
>

If you have wrapped your coarse-level operator in MatShell, then you can
just pass that in and use PCMGDefaultResidual.  Also from the users manual:

*The residual() function can be set to be PCMGDefaultResidual() if one?s
operator is stored in a Mat format.*
*In certain circumstances, where it is much cheaper to calculate the
residual directly, rather than through the*
*usual formula b ? Ax, the user may wish to provide an alternative.*

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/d4f48c99/attachment.htm>

From sylbar.vainbot at gmail.com  Fri Dec  3 14:25:58 2010
From: sylbar.vainbot at gmail.com (Sylvain Barbot)
Date: Fri, 3 Dec 2010 12:25:58 -0800
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
	<AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
	<AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
Message-ID: <AANLkTin0b_bmH5-99E09r=_5JqZsoJ9jh4G-C_zvx8NN@mail.gmail.com>

Hi all,

Very interesting discussion. It would be great to have available a
simple multi-grid example using matrix-free methods. I think it would
greatly clarify the existing documentation about the multi-grid tools
in Petsc. I suggest showing a simple 1-D linear ODE solved with
multigrid, so that one can focus on the architecture. Existing
documentation is clear only retrospectively to most of us.

Best wishes,
Sylvain

2010/12/3 Jed Brown <jed at 59a2.org>:
> On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> Ah, I misunderstood your explanation earlier. If I do provide the
>> restriction/prolongation along with a fine-grid shell matrix and opt
>> to not use Galerkin MG, then how do I provide the coarse grid
>> operators to petsc?
>
> PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators().
>
>>
>> I also just remembered from one of your earlier
>> posts that you mentioned the use of non-Galerkin coarse operators
>> requires a coarse mesh to be provided.
>
> No, this is not required. PCMG's interface is purely algebraic, you do not
> need to use DMMG or otherwise provide a "mesh".  You have to provide
> coarse-level operators (as described above).  This is all in the users
> manual.
>
>>
>> And I dont quite get what a matrix-free residual is.. Wouldn?t
>> PCMGDefaultResidual compute the residual with just MatMult operation
>> defined (b-Ax) for every level ? Why do I need a custom residual
>> operator ?
>
> If you have wrapped your coarse-level operator in MatShell, then you can
> just pass that in and use PCMGDefaultResidual.  Also from the users manual:
> The residual() function can be set to be PCMGDefaultResidual() if one's
> operator is stored in a Mat format.
> In certain circumstances, where it is much cheaper to calculate the residual
> directly, rather than through the
> usual formula b - Ax, the user may wish to provide an alternative.
> Jed

From vijay.m at gmail.com  Fri Dec  3 14:37:35 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Fri, 3 Dec 2010 14:37:35 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
	<AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
	<AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
Message-ID: <AANLkTikF4KMTBMxAVdE-koz=jjQAVTN91N6tUTcuOhNf@mail.gmail.com>

The custom residual makes sense now. I can find the residual itself in
the same way my matrix application works and this should result in
some savings..

Jed, thanks a ton for these detailed explanations. I think I
understand enough to get going with this. If I hit a roadblock, I will
post a question here. Thanks and have a great day.

Vijay

On Fri, Dec 3, 2010 at 2:16 PM, Jed Brown <jed at 59a2.org> wrote:
> On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> Ah, I misunderstood your explanation earlier. If I do provide the
>> restriction/prolongation along with a fine-grid shell matrix and opt
>> to not use Galerkin MG, then how do I provide the coarse grid
>> operators to petsc?
>
> PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators().
>
>>
>> I also just remembered from one of your earlier
>> posts that you mentioned the use of non-Galerkin coarse operators
>> requires a coarse mesh to be provided.
>
> No, this is not required. PCMG's interface is purely algebraic, you do not
> need to use DMMG or otherwise provide a "mesh".  You have to provide
> coarse-level operators (as described above).  This is all in the users
> manual.
>
>>
>> And I dont quite get what a matrix-free residual is.. Wouldn?t
>> PCMGDefaultResidual compute the residual with just MatMult operation
>> defined (b-Ax) for every level ? Why do I need a custom residual
>> operator ?
>
> If you have wrapped your coarse-level operator in MatShell, then you can
> just pass that in and use PCMGDefaultResidual.  Also from the users manual:
> The residual() function can be set to be PCMGDefaultResidual() if one's
> operator is stored in a Mat format.
> In certain circumstances, where it is much cheaper to calculate the residual
> directly, rather than through the
> usual formula b - Ax, the user may wish to provide an alternative.
> Jed

From balay at mcs.anl.gov  Fri Dec  3 15:18:18 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 3 Dec 2010 15:18:18 -0600 (CST)
Subject: [petsc-users] potential e-mail disruption due to power outage
Message-ID: <alpine.LFD.2.02.1012031510260.23976@localhost6.localdomain6>

We are having a power outage this weekend at MCS [staring friday
evening, and ending sunday evening or monday morning CST]

e-mail, mailing-lists, ftp, web servers are supposed to work during
this outage. But if something isn't working - and we are unable to
respond by e-mail, you now know the reason.

Satish


From xdliang at gmail.com  Fri Dec  3 16:19:31 2010
From: xdliang at gmail.com (Xiangdong Liang)
Date: Fri, 3 Dec 2010 17:19:31 -0500
Subject: [petsc-users] run direct linear solver in parallel
Message-ID: <AANLkTi=4XsDcZ53iZJw9wJJP-MOLmTnznFNE0g46YpnA@mail.gmail.com>

Hi everyone,

I am wondering how I can run the direct solver in parallel. I can run
my program in a single processor with direct linear solver by

./foo.out  -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles

However, when I try to run it with mpi:

mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu
-pc_factor_mat_solver_package spooles

I got error like this:

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: No support for this operation for this object type!
[0]PETSC ERROR: Matrix type mpiaij  symbolic LU!

[0]PETSC ERROR: Libraries linked from
/home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
[0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008
[0]PETSC ERROR: Configure options --with-shared --with-dynamic
--with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi
--with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack
--with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
--with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
--with-superlu=1 --with-superlu-include=/usr/include/superlu
--with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
--with-spooles-include=/usr/include/spooles
--with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
--with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c
[0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c
-------------------------------------------------------

Would you like to tell me where I am doing wrong? I appreciate your help.

Xiangdong

From jed at 59A2.org  Fri Dec  3 16:22:06 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 3 Dec 2010 23:22:06 +0100
Subject: [petsc-users] run direct linear solver in parallel
In-Reply-To: <AANLkTi=4XsDcZ53iZJw9wJJP-MOLmTnznFNE0g46YpnA@mail.gmail.com>
References: <AANLkTi=4XsDcZ53iZJw9wJJP-MOLmTnznFNE0g46YpnA@mail.gmail.com>
Message-ID: <AANLkTinM+hdcqF0kki+2q2vOx3nKYkzUdMu2n5G0-7u2@mail.gmail.com>

On Fri, Dec 3, 2010 at 23:19, Xiangdong Liang <xdliang at gmail.com> wrote:

> /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
>

2.3.3 is very old, you should upgrade to petsc-3.1

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101203/4f852b09/attachment.htm>

From bsmith at mcs.anl.gov  Fri Dec  3 16:22:24 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 3 Dec 2010 16:22:24 -0600
Subject: [petsc-users] run direct linear solver in parallel
In-Reply-To: <AANLkTi=4XsDcZ53iZJw9wJJP-MOLmTnznFNE0g46YpnA@mail.gmail.com>
References: <AANLkTi=4XsDcZ53iZJw9wJJP-MOLmTnznFNE0g46YpnA@mail.gmail.com>
Message-ID: <65790980-1EFC-45A4-9304-B18597F049E9@mcs.anl.gov>


  You are using an ANCIENT version of PETSc but using documentation from a much newer version. You should upgrade to petsc-3.1 immediately then life will be easy :-)


   Barry

On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote:

> Hi everyone,
> 
> I am wondering how I can run the direct solver in parallel. I can run
> my program in a single processor with direct linear solver by
> 
> ./foo.out  -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles
> 
> However, when I try to run it with mpi:
> 
> mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu
> -pc_factor_mat_solver_package spooles
> 
> I got error like this:
> 
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: No support for this operation for this object type!
> [0]PETSC ERROR: Matrix type mpiaij  symbolic LU!
> 
> [0]PETSC ERROR: Libraries linked from
> /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
> [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008
> [0]PETSC ERROR: Configure options --with-shared --with-dynamic
> --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi
> --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack
> --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
> --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
> --with-superlu=1 --with-superlu-include=/usr/include/superlu
> --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
> --with-spooles-include=/usr/include/spooles
> --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
> --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c
> [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c
> -------------------------------------------------------
> 
> Would you like to tell me where I am doing wrong? I appreciate your help.
> 
> Xiangdong


From hzhang at mcs.anl.gov  Fri Dec  3 19:47:48 2010
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Fri, 3 Dec 2010 19:47:48 -0600
Subject: [petsc-users] run direct linear solver in parallel
In-Reply-To: <65790980-1EFC-45A4-9304-B18597F049E9@mcs.anl.gov>
References: <AANLkTi=4XsDcZ53iZJw9wJJP-MOLmTnznFNE0g46YpnA@mail.gmail.com>
	<65790980-1EFC-45A4-9304-B18597F049E9@mcs.anl.gov>
Message-ID: <AANLkTimgyhARZEr4Z=b-wMymTfjfKvhzsc=JcMmEtF4z@mail.gmail.com>

Note: spooles has been out of support from its developers for more
than 10 years.
Recommend to use superlu_dist or mumps.

Hong

On Fri, Dec 3, 2010 at 4:22 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?You are using an ANCIENT version of PETSc but using documentation from a much newer version. You should upgrade to petsc-3.1 immediately then life will be easy :-)
>
>
> ? Barry
>
> On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote:
>
>> Hi everyone,
>>
>> I am wondering how I can run the direct solver in parallel. I can run
>> my program in a single processor with direct linear solver by
>>
>> ./foo.out ?-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>
>> However, when I try to run it with mpi:
>>
>> mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu
>> -pc_factor_mat_solver_package spooles
>>
>> I got error like this:
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [0]PETSC ERROR: No support for this operation for this object type!
>> [0]PETSC ERROR: Matrix type mpiaij ?symbolic LU!
>>
>> [0]PETSC ERROR: Libraries linked from
>> /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
>> [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008
>> [0]PETSC ERROR: Configure options --with-shared --with-dynamic
>> --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi
>> --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack
>> --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
>> --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
>> --with-superlu=1 --with-superlu-include=/usr/include/superlu
>> --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
>> --with-spooles-include=/usr/include/spooles
>> --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
>> --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c
>> [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c
>> -------------------------------------------------------
>>
>> Would you like to tell me where I am doing wrong? I appreciate your help.
>>
>> Xiangdong
>
>

From chetan.jhurani at gmail.com  Mon Dec  6 21:47:36 2010
From: chetan.jhurani at gmail.com (Chetan Jhurani)
Date: Mon, 6 Dec 2010 20:47:36 -0700
Subject: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - flipped output in
	KSPSolve.
Message-ID: <4A1257D9430643B892E650250EBC86BD@spiff>


Hello,

I have a small code that solves a 3x3 system with and without
SuperLU.  If SuperLU is used, vectors x and b (in A x = b) come
out flipped after KSPSolve.  Am I doing something stupid?  The two
outputs, the code, and the configure command are pasted below.

Thanks,

Chetan

--------------------------------------------------------------------------

Run with SuperLU:

~> lu_test -pc_factor_mat_solver_package superlu
x before solve
-10
-10
-10
b before solve
200
200
200
x after solve
200
200
200
b after solve
0.2
0.2
0.2

--------------------------------------------------------------------------

Run without SuperLU (default KSP, PC options)

~> lu_test
x before solve
-10
-10
-10
b before solve
200
200
200
x after solve
0.2
0.2
0.2
b after solve
200
200
200

--------------------------------------------------------------------------

Program:

int main(int argc, char* argv[])
{
    PetscErrorCode ierr;

    ierr = PetscInitialize(&argc, &argv, PETSC_NULL, PETSC_NULL);
CHKERRQ(ierr);

    Mat A;
    ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 1, PETSC_NULL, &A);
CHKERRQ(ierr);
    ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
    ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);

    ierr = MatShift(A, 1000); CHKERRQ(ierr);

    Vec x, b;
    ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &x); CHKERRQ(ierr);
    ierr = VecShift(x, -10); CHKERRQ(ierr);

    ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &b); CHKERRQ(ierr);
    ierr = VecShift(b, 200); CHKERRQ(ierr);

    KSP ksp;
    ierr = KSPCreate(PETSC_COMM_SELF, &ksp); CHKERRQ(ierr);
    ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);

    printf("x before solve\n");
    ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
    printf("b before solve\n");
    ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);

    ierr = KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN); CHKERRQ(ierr);
    ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr);

    printf("x after solve\n");
    ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
    printf("b after solve\n");
    ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);

    ierr = PetscFinalize(); CHKERRQ(ierr);
}

--------------------------------------------------------------------------

configure command:

python ./config/configure.py -with-clanguage=C++ -with-debugging=no
--with-gnu-compilers=1 --with-mpi=1 --with-umfpack=1 --with-superlu=1
--with-hypre=1 --download-umfpack=1 --download-superlu=1
--download-hypre=1 --with-mumps=1 --download-mumps=1 --with-parmetis=1
--download-parmetis=1 --with-scalapack=1 --download-scalapack=1
--with-blacs=1 --download-blacs=1 -with-c-support=1

--------------------------------------------------------------------------


From hzhang at mcs.anl.gov  Mon Dec  6 22:57:35 2010
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 6 Dec 2010 22:57:35 -0600
Subject: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - flipped output in
	KSPSolve.
In-Reply-To: <4A1257D9430643B892E650250EBC86BD@spiff>
References: <4A1257D9430643B892E650250EBC86BD@spiff>
Message-ID: <AANLkTi=OHyBwebcC0MZiuRKgv4HZWeRH-cXwfaZTpcON@mail.gmail.com>

Chetan:

Move
ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);

right before
ierr = KSPSolve()
and run your code with
-pc_type lu -pc_factor_mat_solver_package superlu

I got correct answer.
Attached is my modified code.

Hong
>
> I have a small code that solves a 3x3 system with and without
> SuperLU. ?If SuperLU is used, vectors x and b (in A x = b) come
> out flipped after KSPSolve. ?Am I doing something stupid? ?The two
> outputs, the code, and the configure command are pasted below.
>
> Thanks,
>
> Chetan
>
> --------------------------------------------------------------------------
>
> Run with SuperLU:
>
> ~> lu_test -pc_factor_mat_solver_package superlu
> x before solve
> -10
> -10
> -10
> b before solve
> 200
> 200
> 200
> x after solve
> 200
> 200
> 200
> b after solve
> 0.2
> 0.2
> 0.2
>
> --------------------------------------------------------------------------
>
> Run without SuperLU (default KSP, PC options)
>
> ~> lu_test
> x before solve
> -10
> -10
> -10
> b before solve
> 200
> 200
> 200
> x after solve
> 0.2
> 0.2
> 0.2
> b after solve
> 200
> 200
> 200
>
> --------------------------------------------------------------------------
>
> Program:
>
> int main(int argc, char* argv[])
> {
> ? ?PetscErrorCode ierr;
>
> ? ?ierr = PetscInitialize(&argc, &argv, PETSC_NULL, PETSC_NULL);
> CHKERRQ(ierr);
>
> ? ?Mat A;
> ? ?ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 1, PETSC_NULL, &A);
> CHKERRQ(ierr);
> ? ?ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> ? ?ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>
> ? ?ierr = MatShift(A, 1000); CHKERRQ(ierr);
>
> ? ?Vec x, b;
> ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &x); CHKERRQ(ierr);
> ? ?ierr = VecShift(x, -10); CHKERRQ(ierr);
>
> ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &b); CHKERRQ(ierr);
> ? ?ierr = VecShift(b, 200); CHKERRQ(ierr);
>
> ? ?KSP ksp;
> ? ?ierr = KSPCreate(PETSC_COMM_SELF, &ksp); CHKERRQ(ierr);
> ? ?ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
>
> ? ?printf("x before solve\n");
> ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
> ? ?printf("b before solve\n");
> ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
>
> ? ?ierr = KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN); CHKERRQ(ierr);
> ? ?ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr);
>
> ? ?printf("x after solve\n");
> ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
> ? ?printf("b after solve\n");
> ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
>
> ? ?ierr = PetscFinalize(); CHKERRQ(ierr);
> }
>
> --------------------------------------------------------------------------
>
> configure command:
>
> python ./config/configure.py -with-clanguage=C++ -with-debugging=no
> --with-gnu-compilers=1 --with-mpi=1 --with-umfpack=1 --with-superlu=1
> --with-hypre=1 --download-umfpack=1 --download-superlu=1
> --download-hypre=1 --with-mumps=1 --download-mumps=1 --with-parmetis=1
> --download-parmetis=1 --with-scalapack=1 --download-scalapack=1
> --with-blacs=1 --download-blacs=1 -with-c-support=1
>
> --------------------------------------------------------------------------
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chetan.c
Type: text/x-csrc
Size: 1494 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101206/97d360bd/attachment-0001.c>

From chetan.jhurani at gmail.com  Mon Dec  6 23:25:52 2010
From: chetan.jhurani at gmail.com (Chetan Jhurani)
Date: Mon, 6 Dec 2010 22:25:52 -0700
Subject: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - flipped output
	inKSPSolve.
In-Reply-To: <AANLkTi=OHyBwebcC0MZiuRKgv4HZWeRH-cXwfaZTpcON@mail.gmail.com>
References: <4A1257D9430643B892E650250EBC86BD@spiff>
	<AANLkTi=OHyBwebcC0MZiuRKgv4HZWeRH-cXwfaZTpcON@mail.gmail.com>
Message-ID: <47EF0668992E46D683DA71BB94E8A485@spiff>


Thanks Hong.  It does work now.

Chetan

> -----Original Message-----
> From: petsc-users-bounces at mcs.anl.gov 
> [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Hong Zhang
> Sent: Monday, December 06, 2010 09:58 PM
> To: PETSc users list
> Subject: Re: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - 
> flipped output inKSPSolve.
> 
> Chetan:
> 
> Move
> ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
> 
> right before
> ierr = KSPSolve()
> and run your code with
> -pc_type lu -pc_factor_mat_solver_package superlu
> 
> I got correct answer.
> Attached is my modified code.
> 
> Hong
> >
> > I have a small code that solves a 3x3 system with and without
> > SuperLU. ?If SuperLU is used, vectors x and b (in A x = b) come
> > out flipped after KSPSolve. ?Am I doing something stupid? ?The two
> > outputs, the code, and the configure command are pasted below.
> >
> > Thanks,
> >
> > Chetan
> >
> > 
> --------------------------------------------------------------
> ------------
> >
> > Run with SuperLU:
> >
> > ~> lu_test -pc_factor_mat_solver_package superlu
> > x before solve
> > -10
> > -10
> > -10
> > b before solve
> > 200
> > 200
> > 200
> > x after solve
> > 200
> > 200
> > 200
> > b after solve
> > 0.2
> > 0.2
> > 0.2
> >
> > 
> --------------------------------------------------------------
> ------------
> >
> > Run without SuperLU (default KSP, PC options)
> >
> > ~> lu_test
> > x before solve
> > -10
> > -10
> > -10
> > b before solve
> > 200
> > 200
> > 200
> > x after solve
> > 0.2
> > 0.2
> > 0.2
> > b after solve
> > 200
> > 200
> > 200
> >
> > 
> --------------------------------------------------------------
> ------------
> >
> > Program:
> >
> > int main(int argc, char* argv[])
> > {
> > ? ?PetscErrorCode ierr;
> >
> > ? ?ierr = PetscInitialize(&argc, &argv, PETSC_NULL, PETSC_NULL);
> > CHKERRQ(ierr);
> >
> > ? ?Mat A;
> > ? ?ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 1, PETSC_NULL, &A);
> > CHKERRQ(ierr);
> > ? ?ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> > ? ?ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
> >
> > ? ?ierr = MatShift(A, 1000); CHKERRQ(ierr);
> >
> > ? ?Vec x, b;
> > ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &x); CHKERRQ(ierr);
> > ? ?ierr = VecShift(x, -10); CHKERRQ(ierr);
> >
> > ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &b); CHKERRQ(ierr);
> > ? ?ierr = VecShift(b, 200); CHKERRQ(ierr);
> >
> > ? ?KSP ksp;
> > ? ?ierr = KSPCreate(PETSC_COMM_SELF, &ksp); CHKERRQ(ierr);
> > ? ?ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
> >
> > ? ?printf("x before solve\n");
> > ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
> > ? ?printf("b before solve\n");
> > ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
> >
> > ? ?ierr = KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN); 
> CHKERRQ(ierr);
> > ? ?ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr);
> >
> > ? ?printf("x after solve\n");
> > ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
> > ? ?printf("b after solve\n");
> > ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr);
> >
> > ? ?ierr = PetscFinalize(); CHKERRQ(ierr);
> > }
> >
> > 
> --------------------------------------------------------------
> ------------
> >
> > configure command:
> >
> > python ./config/configure.py -with-clanguage=C++ -with-debugging=no
> > --with-gnu-compilers=1 --with-mpi=1 --with-umfpack=1 
> --with-superlu=1
> > --with-hypre=1 --download-umfpack=1 --download-superlu=1
> > --download-hypre=1 --with-mumps=1 --download-mumps=1 
> --with-parmetis=1
> > --download-parmetis=1 --with-scalapack=1 --download-scalapack=1
> > --with-blacs=1 --download-blacs=1 -with-c-support=1
> >
> > 
> --------------------------------------------------------------
> ------------
> >
> >
> 


From vijay.m at gmail.com  Tue Dec  7 00:42:52 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 7 Dec 2010 00:42:52 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTikF4KMTBMxAVdE-koz=jjQAVTN91N6tUTcuOhNf@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
	<AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
	<AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
	<AANLkTikF4KMTBMxAVdE-koz=jjQAVTN91N6tUTcuOhNf@mail.gmail.com>
Message-ID: <AANLkTikiROnhx75qR9ds=g20xffo+SBTxaGCZP9NpZ-v@mail.gmail.com>

Jed,

I stumbled onto an issue and it is probably my lack of complete
understanding still. I'll try to be as clear as possible but if it is
unclear, do let me know.

When I use PCMG as a preconditioner to solve a fine grid system using
a linear solver, how do I set the interpolation to the linear system
being solved. i.e., my preconditioner starts at max_levels and
hierarchy proceeds to 0 (coarsest) and my linear system is technically
at max_levels+1. I have a vector length incompatibility since I cannot
seem to set the prolongation to go from max_levels to max_levels+1,
where the linear solver subspace resides. I am probably not setting
the right projection matrices at the right levels but it is slightly
confusing since the coarsest level does not take any projection
matrices. Hence my question is, when PCMG proceeds with the richardson
iteration at every linear iteration on finest grid problem (with say
FGMRES), how does the action of the preconditioner return a vector of
right length ?

I thought of adding the original fine grid problem to the PCMG levels
(now max_levels=max_levels+1) but this by philosophy uses PCMG itself
as a solver. Does it not ? Or did I misunderstand and make a wrong
assumption here ?

I've been stuck on this for a while and looking at DMMG implementation
didn't help with my problem either. Any help is much appreciated.

Vijay

On Fri, Dec 3, 2010 at 2:37 PM, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
> The custom residual makes sense now. I can find the residual itself in
> the same way my matrix application works and this should result in
> some savings..
>
> Jed, thanks a ton for these detailed explanations. I think I
> understand enough to get going with this. If I hit a roadblock, I will
> post a question here. Thanks and have a great day.
>
> Vijay
>
> On Fri, Dec 3, 2010 at 2:16 PM, Jed Brown <jed at 59a2.org> wrote:
>> On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>>
>>> Ah, I misunderstood your explanation earlier. If I do provide the
>>> restriction/prolongation along with a fine-grid shell matrix and opt
>>> to not use Galerkin MG, then how do I provide the coarse grid
>>> operators to petsc?
>>
>> PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators().
>>
>>>
>>> I also just remembered from one of your earlier
>>> posts that you mentioned the use of non-Galerkin coarse operators
>>> requires a coarse mesh to be provided.
>>
>> No, this is not required. PCMG's interface is purely algebraic, you do not
>> need to use DMMG or otherwise provide a "mesh". ?You have to provide
>> coarse-level operators (as described above). ?This is all in the users
>> manual.
>>
>>>
>>> And I dont quite get what a matrix-free residual is.. Wouldn?t
>>> PCMGDefaultResidual compute the residual with just MatMult operation
>>> defined (b-Ax) for every level ? Why do I need a custom residual
>>> operator ?
>>
>> If you have wrapped your coarse-level operator in MatShell, then you can
>> just pass that in and use PCMGDefaultResidual. ?Also from the users manual:
>> The residual() function can be set to be PCMGDefaultResidual() if one's
>> operator is stored in a Mat format.
>> In certain circumstances, where it is much cheaper to calculate the residual
>> directly, rather than through the
>> usual formula b - Ax, the user may wish to provide an alternative.
>> Jed
>

From jed at 59A2.org  Tue Dec  7 04:07:30 2010
From: jed at 59A2.org (Jed Brown)
Date: Tue, 7 Dec 2010 11:07:30 +0100
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTikiROnhx75qR9ds=g20xffo+SBTxaGCZP9NpZ-v@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
	<AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
	<AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
	<AANLkTikF4KMTBMxAVdE-koz=jjQAVTN91N6tUTcuOhNf@mail.gmail.com>
	<AANLkTikiROnhx75qR9ds=g20xffo+SBTxaGCZP9NpZ-v@mail.gmail.com>
Message-ID: <AANLkTikWweuhRwNcYoPNDgdX5L4Bw5ci-LF9a96Ct6Gz@mail.gmail.com>

On Tue, Dec 7, 2010 at 07:42, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> When I use PCMG as a preconditioner to solve a fine grid system using
> a linear solver, how do I set the interpolation to the linear system
> being solved. i.e., my preconditioner starts at max_levels and
> hierarchy proceeds to 0 (coarsest) and my linear system is technically
> at max_levels+1.
>

Set nlevels to whatever you want with PCMGSetLevels, then level=nlevels-1 is
the fine-level problem, you can set the pre-smoother on that level to PCNONE
if you want the "down" part of your cycle to skip it.


> I have a vector length incompatibility since I cannot
> seem to set the prolongation to go from max_levels to max_levels+1,
> where the linear solver subspace resides.
>

The finest level in PCMG should be the space that your KSP works in.  The
interpolation operator on that level maps from the next coarsest level, i.e.
MatInterpolate(level[n].restrict,level[n-1].x,level[n].x);


> I thought of adding the original fine grid problem to the PCMG levels
> (now max_levels=max_levels+1) but this by philosophy uses PCMG itself
> as a solver. Does it not ?
>

No, it's still a preconditioner.  "Multigrid as a solver" just means
"Richardson preconditioned by multigrid".

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101207/0b025430/attachment.htm>

From neckel at in.tum.de  Tue Dec  7 09:17:18 2010
From: neckel at in.tum.de (Tobias Neckel)
Date: Tue, 07 Dec 2010 16:17:18 +0100
Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6
Message-ID: <4CFE4FFE.9010602@in.tum.de>

Hello,

we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor 
problem in some of our test cases: We use seqaij matrix format and GMRES 
with ILU dt.

Our code contained a statement using PCFactorSetUseDropTolerance() which 
does not exist any longer. So we renamed the function call to 
PCFactorSetDropTolerance (same signature) which has no online docu but 
found by google ;-) Obviously, this function has not the same 
functionality as our tests fail.

In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers 
Available from PETSc" showed a line containing:
ILU dt: ILU dt 	seqaij 	Sparsekit (table survey)

This is not included in the docu of 3.1-p6 
(http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html 
) any more. Does that mean that the ILU dt version is not supported by 
the (default) petsc? I also checked the changelog but did not find 
anything there; sorry if I missed sth.

Any help or infos will be highly appreciated ;-)

Thanks and best regards
Tobias

From zonexo at gmail.com  Tue Dec  7 09:29:34 2010
From: zonexo at gmail.com (TAY wee-beng)
Date: Tue, 07 Dec 2010 16:29:34 +0100
Subject: [petsc-users] Getting PETSc fortran code to compile in Eclipse with
	ifort
In-Reply-To: <4A1257D9430643B892E650250EBC86BD@spiff>
References: <4A1257D9430643B892E650250EBC86BD@spiff>
Message-ID: <4CFE52DE.3050302@gmail.com>

Hi,

I am now trying to get my PETSc fortran code to compile and run in 
Eclipse photran for Linux. I am using ifort. I have a makefile which I 
used to compile on a linux cluster.

However, I can't get it to work on Photran Eclipse. I got the error msg:

**** Build of configuration Debug_Intel64 for project ibm2d_hypre ****

make all
make: *** No rule to make target `flux_area.o', needed by `airfoil.o'.  
Stop.

Is there anyone with experience in this area?

Thank you.

Yours sincerely,

TAY wee-beng


From hzhang at mcs.anl.gov  Tue Dec  7 10:10:26 2010
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Tue, 7 Dec 2010 10:10:26 -0600
Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6
In-Reply-To: <4CFE4FFE.9010602@in.tum.de>
References: <4CFE4FFE.9010602@in.tum.de>
Message-ID: <AANLkTinHWHnSJoL_XRnL2MMdEAuZEVL9s4-Hc1SOADNb@mail.gmail.com>

Tobias :
Yes, we replace it with superlu's ilu with drop tolerance.
To use it, configure petsc with '--download-superlu --download-parmetis'
Then run your code with option
-pc_type ilu -pc_factor_mat_solver_package superlu -mat_superlu_ilu_droptol <>

see available options with '-help |grep superlu'

Suggest to use petsc-dev for such function, because we have fixed
several bugs in superlu interface.

Hong
>
> we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor
> problem in some of our test cases: We use seqaij matrix format and GMRES
> with ILU dt.
>
> Our code contained a statement using PCFactorSetUseDropTolerance() which
> does not exist any longer. So we renamed the function call to
> PCFactorSetDropTolerance (same signature) which has no online docu but found
> by google ;-) Obviously, this function has not the same functionality as our
> tests fail.
>
> In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available
> from PETSc" showed a line containing:
> ILU dt: ILU dt ?seqaij ?Sparsekit (table survey)
>
> This is not included in the docu of 3.1-p6
> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html
> ) any more. Does that mean that the ILU dt version is not supported by the
> (default) petsc? I also checked the changelog but did not find anything
> there; sorry if I missed sth.
>
> Any help or infos will be highly appreciated ;-)
>
> Thanks and best regards
> Tobias
>

From vijay.m at gmail.com  Tue Dec  7 13:21:37 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 7 Dec 2010 13:21:37 -0600
Subject: [petsc-users] Is PCMG a generic PC object ?
In-Reply-To: <AANLkTikWweuhRwNcYoPNDgdX5L4Bw5ci-LF9a96Ct6Gz@mail.gmail.com>
References: <AANLkTimxkq-c5P1vZg0YDxtAs42vGpgqDgX2qDBF7mgb@mail.gmail.com>
	<AANLkTinWnS5dAvWBWp0Hw1TC3ZCBCYrKq7t3y18e-MKK@mail.gmail.com>
	<AANLkTinOV+DWTYys2x0fbF6vk3C2RAfGomQq_crLC2GR@mail.gmail.com>
	<AANLkTinyUaQ1xTFBbkzxO=c0o0xS0URgNeEfv9KxhNC4@mail.gmail.com>
	<AANLkTi=g-Ur943ojT8gzNOhtnfeex-vmSM9vs+mptPXE@mail.gmail.com>
	<AANLkTi=QLGJFvuUDr0V+GySwDf47rAkT1_OvPt7ShHpG@mail.gmail.com>
	<AANLkTimwfvDzTOaQGPfYjPdNHMhtuQP=XX8tEpfut4yc@mail.gmail.com>
	<AANLkTindp=uOXZeeFUUg-LzOq+NHtunORJNAaMaR4GT9@mail.gmail.com>
	<AANLkTinm6X+v4P-OY2rUzp7rFyzSkqqKqAwQqda8LRbO@mail.gmail.com>
	<AANLkTi=nOAROeRerxycHtjMCxfHt94DUuyG7WPyxbuEt@mail.gmail.com>
	<AANLkTinE-eZfxVtN7K3d=bMNePyu-cVHNUceg_d6rSFN@mail.gmail.com>
	<AANLkTikF4KMTBMxAVdE-koz=jjQAVTN91N6tUTcuOhNf@mail.gmail.com>
	<AANLkTikiROnhx75qR9ds=g20xffo+SBTxaGCZP9NpZ-v@mail.gmail.com>
	<AANLkTikWweuhRwNcYoPNDgdX5L4Bw5ci-LF9a96Ct6Gz@mail.gmail.com>
Message-ID: <AANLkTinKBigBuLLxFshyrDVHv57fOtLasAxicR4iKRXf@mail.gmail.com>

Jed, that worked perfectly. I had a hunch this is what's needed but
glad to see all issues resolved. Again, thanks for the help.

Vijay

On Tue, Dec 7, 2010 at 4:07 AM, Jed Brown <jed at 59a2.org> wrote:
> On Tue, Dec 7, 2010 at 07:42, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> When I use PCMG as a preconditioner to solve a fine grid system using
>> a linear solver, how do I set the interpolation to the linear system
>> being solved. i.e., my preconditioner starts at max_levels and
>> hierarchy proceeds to 0 (coarsest) and my linear system is technically
>> at max_levels+1.
>
> Set nlevels to whatever you want with PCMGSetLevels, then level=nlevels-1 is
> the fine-level problem, you can set the pre-smoother on that level to PCNONE
> if you want the "down" part of your cycle to skip it.
>
>>
>> I have a vector length incompatibility since I cannot
>> seem to set the prolongation to go from max_levels to max_levels+1,
>> where the linear solver subspace resides.
>
> The finest level in PCMG should be the space that your KSP works in. ?The
> interpolation operator on that level maps from the next coarsest level, i.e.
> MatInterpolate(level[n].restrict,level[n-1].x,level[n].x);
>
>>
>> I thought of adding the original fine grid problem to the PCMG levels
>> (now max_levels=max_levels+1) but this by philosophy uses PCMG itself
>> as a solver. Does it not ?
>
> No, it's still a preconditioner. ?"Multigrid as a solver" just means
> "Richardson preconditioned by multigrid".
> Jed

From m.skates82 at gmail.com  Tue Dec  7 15:45:02 2010
From: m.skates82 at gmail.com (Nunion)
Date: Tue, 7 Dec 2010 15:45:02 -0600
Subject: [petsc-users] MatMAIJ tests
Message-ID: <AANLkTi=XyLWRxFu=p1dN69rZ-AmSikAfdjdQXexM6R77@mail.gmail.com>

Hello all,

What would be a fix to get around the issue with complex numbers for  ex100.c:
Tests vatious routines in MatMAIJ
format<http://w3.pppl.gov/m3d/petsc-dev/src/mat/examples/tests/ex100.c.html>in
the src/mat/examples/tests/ directory.
Currently it is not written for complex numbers.

Thanks,

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101207/c5ab917e/attachment.htm>

From jed at 59A2.org  Tue Dec  7 15:50:57 2010
From: jed at 59A2.org (Jed Brown)
Date: Tue, 7 Dec 2010 22:50:57 +0100
Subject: [petsc-users] MatMAIJ tests
In-Reply-To: <AANLkTi=XyLWRxFu=p1dN69rZ-AmSikAfdjdQXexM6R77@mail.gmail.com>
References: <AANLkTi=XyLWRxFu=p1dN69rZ-AmSikAfdjdQXexM6R77@mail.gmail.com>
Message-ID: <AANLkTimh1g=Cp8Nmods66Frc4oVj_=8SrzW-ua2wsR25@mail.gmail.com>

On Tue, Dec 7, 2010 at 22:45, Nunion <m.skates82 at gmail.com> wrote:

> What would be a fix to get around the issue with complex numbers for  ex100.c:
> Tests vatious routines in MatMAIJ format<http://w3.pppl.gov/m3d/petsc-dev/src/mat/examples/tests/ex100.c.html>in the src/mat/examples/tests/ directory.
> Currently it is not written for complex numbers.
>

The code in that test would not have to change, but the on-disk binary
format would need to change so that real matrices could be loaded.  Or a
sister file would need to be written that did use complex.  Why do you want
to run this test with complex?

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101207/ea9bc5d8/attachment.htm>

From m.skates82 at gmail.com  Tue Dec  7 20:52:20 2010
From: m.skates82 at gmail.com (Nunion)
Date: Tue, 7 Dec 2010 20:52:20 -0600
Subject: [petsc-users] MatMAIJ tests
In-Reply-To: <AANLkTimh1g=Cp8Nmods66Frc4oVj_=8SrzW-ua2wsR25@mail.gmail.com>
References: <AANLkTi=XyLWRxFu=p1dN69rZ-AmSikAfdjdQXexM6R77@mail.gmail.com>
	<AANLkTimh1g=Cp8Nmods66Frc4oVj_=8SrzW-ua2wsR25@mail.gmail.com>
Message-ID: <AANLkTik2qWXMN70z-jXFcQux1jVMJ1n-_KPvh2Wtc7cs@mail.gmail.com>

Yes, creating a sister file is the approach I am taking.  I am interested in
alternate approaches (implementations) for the matrix multi-vector multiply
for improving performance.  I read that this can be achieved using the
MatMult function on matrices created with MatCreateMAIJ.  My problem happens
to be complex.

On Tue, Dec 7, 2010 at 3:50 PM, Jed Brown <jed at 59a2.org> wrote:

> On Tue, Dec 7, 2010 at 22:45, Nunion <m.skates82 at gmail.com> wrote:
>
>> What would be a fix to get around the issue with complex numbers for  ex100.c:
>> Tests vatious routines in MatMAIJ format<http://w3.pppl.gov/m3d/petsc-dev/src/mat/examples/tests/ex100.c.html>in the src/mat/examples/tests/ directory.
>> Currently it is not written for complex numbers.
>>
>
> The code in that test would not have to change, but the on-disk binary
> format would need to change so that real matrices could be loaded.  Or a
> sister file would need to be written that did use complex.  Why do you want
> to run this test with complex?
>
> Jed
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101207/1d15e243/attachment.htm>

From neckel at in.tum.de  Wed Dec  8 02:08:48 2010
From: neckel at in.tum.de (Tobias Neckel)
Date: Wed, 08 Dec 2010 09:08:48 +0100
Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6
In-Reply-To: <AANLkTinHWHnSJoL_XRnL2MMdEAuZEVL9s4-Hc1SOADNb@mail.gmail.com>
References: <4CFE4FFE.9010602@in.tum.de>
	<AANLkTinHWHnSJoL_XRnL2MMdEAuZEVL9s4-Hc1SOADNb@mail.gmail.com>
Message-ID: <4CFF3D10.6000600@in.tum.de>

Dear Hong,

thanks for your quick answer.

> Yes, we replace it with superlu's ilu with drop tolerance.
> To use it, configure petsc with '--download-superlu --download-parmetis'
> Then run your code with option
> -pc_type ilu -pc_factor_mat_solver_package superlu -mat_superlu_ilu_droptol<>
>
> see available options with '-help |grep superlu'

Ah, ok. Is there a possibility to hardwire that in the source code? We 
run different integration tests with different solver/pc combinations 
with one single executable call (currently without any options).

> Suggest to use petsc-dev for such function, because we have fixed
> several bugs in superlu interface.

Ok, thanks. Will this functionality be available also in the next 
release (p7) and when is this expected (approximately)?

Thanks and best regards
Tobias

>> we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor
>> problem in some of our test cases: We use seqaij matrix format and GMRES
>> with ILU dt.
>>
>> Our code contained a statement using PCFactorSetUseDropTolerance() which
>> does not exist any longer. So we renamed the function call to
>> PCFactorSetDropTolerance (same signature) which has no online docu but found
>> by google ;-) Obviously, this function has not the same functionality as our
>> tests fail.
>>
>> In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available
>> from PETSc" showed a line containing:
>> ILU dt: ILU dt  seqaij  Sparsekit (table survey)
>>
>> This is not included in the docu of 3.1-p6
>> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html
>> ) any more. Does that mean that the ILU dt version is not supported by the
>> (default) petsc? I also checked the changelog but did not find anything
>> there; sorry if I missed sth.
>>
>> Any help or infos will be highly appreciated ;-)
>>
>> Thanks and best regards
>> Tobias
>>

From jakub.pola at gmail.com  Thu Dec  9 13:44:22 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Thu, 09 Dec 2010 20:44:22 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
Message-ID: <1291923862.2227.14.camel@desktop>

Hello, 

I was trying to use pets library with my GPU (GTX 460 ) and see how it
works. Unfortunatelly I cant run any example with -vec_type cuda. 

Thisis my log from executing ./ex2 -vec_type cuda

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Unknown type. Check for miss-spelling or missing
external package needed for type

seehttp://www.mcs.anl.gov/petsc/petsc-as/documentation/installation.html#external!
[0]PETSC ERROR: Unknown vector type: cuda!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 6, Tue Nov 16
17:02:32 CST 2010
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./ex2 on a linux-gnu named desktop by kuba Thu Dec  9
20:20:38 2010
[0]PETSC ERROR: Libraries linked
from /home/kuba/External/petsc-3.1-p6/linux-gnu-c-debug/lib
[0]PETSC ERROR: Configure run at Thu Dec  9 19:50:31 2010
[0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: VecSetType() line 46 in src/vec/vec/interface/vecreg.c
[0]PETSC ERROR: VecSetTypeFromOptions_Private() line 1335 in
src/vec/vec/interface/vector.c
[0]PETSC ERROR: VecSetFromOptions() line 1372 in
src/vec/vec/interface/vector.c
[0]PETSC ERROR: main() line 127 in src/ksp/ksp/examples/tutorials/ex2.c
application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0[unset]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0


I configured my pets with following command:

petsc-3.1-p6$ ./config/configure.py --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no

Everything went ok. Then I compiled the library with 

make PETSC_DIR=/home/kuba/External/petsc-3.1-p6
PETSC_ARCH=linux-gnu-c-debug all

and after that tests

make PETSC_DIR=/home/kuba/External/petsc-3.1-p6
PETSC_ARCH=linux-gnu-c-debug test

Tests were performed successfully.

I have cuda configured in LD_LIBRARY_PATH:

ldconfig -p | grep cuda
	libicudata.so.42 (ELF) => /usr/lib/libicudata.so.42
	libcusparse.so.3 (libc6) => /usr/local/cuda/lib/libcusparse.so.3
	libcusparse.so (libc6) => /usr/local/cuda/lib/libcusparse.so
	libcurand.so.3 (libc6) => /usr/local/cuda/lib/libcurand.so.3
	libcurand.so (libc6) => /usr/local/cuda/lib/libcurand.so
	libcufft.so.3 (libc6) => /usr/local/cuda/lib/libcufft.so.3
	libcufft.so (libc6) => /usr/local/cuda/lib/libcufft.so
	libcudart.so.3 (libc6) => /usr/local/cuda/lib/libcudart.so.3
	libcudart.so (libc6) => /usr/local/cuda/lib/libcudart.so
	libcuda.so.1 (libc6) => /usr/lib/nvidia-current/libcuda.so.1
	libcuda.so (libc6) => /usr/lib/nvidia-current/libcuda.so
	libcublas.so.3 (libc6) => /usr/local/cuda/lib/libcublas.so.3
	libcublas.so (libc6) => /usr/local/cuda/lib/libcublas.so

and I have cuda directories in PATH:
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/kuba/bin:/usr/local/cuda/bin:/usr/local/cuda/include:/usr/local/cuda/lib:/usr/local/cuda/lib64"

Could you please help me with that.

My final destination is to run CG and BICGSTAB solvers and measure
performance using sparse matrices.

Thank you in advance for your help.

Best regards,
Jakub.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101209/15587332/attachment.htm>

From jed at 59A2.org  Thu Dec  9 13:47:27 2010
From: jed at 59A2.org (Jed Brown)
Date: Thu, 9 Dec 2010 20:47:27 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <1291923862.2227.14.camel@desktop>
References: <1291923862.2227.14.camel@desktop>
Message-ID: <AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>

On Thu, Dec 9, 2010 at 20:44, Jakub Pola <jakub.pola at gmail.com> wrote:

> Petsc Release Version 3.1.0


You need petsc-dev for this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101209/a4af140f/attachment.htm>

From jakub.pola at gmail.com  Thu Dec  9 17:07:45 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Fri, 10 Dec 2010 00:07:45 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
Message-ID: <1291936065.2227.31.camel@desktop>

Hi again,

I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0
I have problems with compiling the library. I did following steps:

configured pets: 

./config/configure.py --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
--with-thrust-dir=/usr/local/cuda/include/thrust

then make:
make PETSC_DIR=/home/kuba/External/petsc-dev
PETSC_ARCH=arch-linux-gnu-c-debug all

here I have problems because compilator says that 
/usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
error: iterator: No such file or directory

But actually it is as a symbolic link:
 ls -l /usr/local/cuda/include/ 
gives 
lrwxrwxrwx 1 root root     34 2010-12-09 23:44 thrust
-> /home/kuba/External/thrust/thrust/

and 

kuba at desktop:~/External/thrust/thrust/iterator$ ls -l
razem 96
-rw-r--r-- 1 kuba kuba  7666 2010-12-09 21:30 constant_iterator.h
-rw-r--r-- 1 kuba kuba  6959 2010-12-09 21:30 counting_iterator.h
drwxr-xr-x 3 kuba kuba  4096 2010-12-09 21:30 detail
-rw-r--r-- 1 kuba kuba  4376 2010-12-09 21:30 iterator_adaptor.h
-rw-r--r-- 1 kuba kuba  8282 2010-12-09 21:30 iterator_categories.h
-rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h
-rw-r--r-- 1 kuba kuba  2066 2010-12-09 21:30 iterator_traits.h
-rw-r--r-- 1 kuba kuba  6880 2010-12-09 21:30 permutation_iterator.h
-rw-r--r-- 1 kuba kuba  7055 2010-12-09 21:30 reverse_iterator.h
-rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h
-rw-r--r-- 1 kuba kuba  7348 2010-12-09 21:30 zip_iterator.h
kuba at desktop:~/External/thrust/thrust/iterator$ 


Have somebody faced this kind of problem?


Here it is compilation log to first error

kuba at desktop:~/External/petsc-dev$ make
PETSC_DIR=/home/kuba/External/petsc-dev
PETSC_ARCH=arch-linux-gnu-c-debug all
==========================================
 
See documentation/faq.html and documentation/bugreporting.html
for help with installation problems. Please send EVERYTHING
printed out below when reporting problems
 
To subscribe to the PETSc announcement list, send mail to 
majordomo at mcs.anl.gov with the message: 
subscribe petsc-announce
 
To subscribe to the PETSc users mailing list, send mail to 
majordomo at mcs.anl.gov with the message: 
subscribe petsc-users
 
==========================================
On czw, 9 gru 2010, 23:56:38 CET on desktop
Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP
Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux
-----------------------------------------
Using PETSc directory: /home/kuba/External/petsc-dev
Using PETSc arch: arch-linux-gnu-c-debug
-----------------------------------------
PETSC_VERSION_RELEASE    0
PETSC_VERSION_MAJOR      3
PETSC_VERSION_MINOR      1
PETSC_VERSION_SUBMINOR   0
PETSC_VERSION_PATCH      6
PETSC_VERSION_DATE       "Mar, 25, 2010"
PETSC_VERSION_PATCH_DATE "unknown"
PETSC_VERSION_HG         "unknown"
PETSC_VERSION_DATE_HG    "unknown"
PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \
-----------------------------------------
Using configure Options: --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
--with-thrust-dir=/usr/local/cuda/include/thrust
Using configuration flags:
#define INCLUDED_PETSCCONF_H
#define IS_COLORING_MAX 65535
#define STDC_HEADERS 1
#define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT
#define PETSC_UINTPTR_T uintptr_t
#define PETSC_HAVE_PTHREAD 1
#define PETSC_STATIC_INLINE static inline
#define PETSC_REPLACE_DIR_SEPARATOR '\\'
#define PETSC_RESTRICT  __restrict__
#define PETSC_HAVE_MPI 1
#define PETSC_USE_SINGLE_LIBRARY 1
#define PETSC_USE_SOCKET_VIEWER 1
#define PETSC_HAVE_THRUST 1
#define PETSC_LIB_DIR
"/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib"
#define PETSC_HAVE_FORTRAN 1
#define PETSC_HAVE_SOWING 1
#define PETSC_SLSUFFIX ""
#define PETSC_FUNCTION_NAME_CXX __func__
#define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1
#define PETSC_UNUSED  
#define PETSC_HAVE_CUDA 1
#define PETSC_FUNCTION_NAME_C __func__
#define PETSC_HAVE_C2HTML 1
#define PETSC_HAVE_VALGRIND 1
#define PETSC_HAVE_BUILTIN_EXPECT 1
#define PETSC_DIR_SEPARATOR '/'
#define PETSC_PATH_SEPARATOR ':'
#define PETSC_HAVE_X11 1
#define PETSC_HAVE_CUSP 1
#define PETSC_Prefetch(a,b,c)  
#define PETSC_HAVE_BLASLAPACK 1
#define PETSC_HAVE_STRING_H 1
#define PETSC_HAVE_SYS_TYPES_H 1
#define PETSC_HAVE_ENDIAN_H 1
#define PETSC_HAVE_SYS_PROCFS_H 1
#define PETSC_HAVE_DLFCN_H 1
#define PETSC_HAVE_STDINT_H 1
#define PETSC_HAVE_LINUX_KERNEL_H 1
#define PETSC_HAVE_TIME_H 1
#define PETSC_HAVE_MATH_H 1
#define PETSC_HAVE_STDLIB_H 1
#define PETSC_HAVE_SYS_PARAM_H 1
#define PETSC_HAVE_SYS_SOCKET_H 1
#define PETSC_HAVE_UNISTD_H 1
#define PETSC_HAVE_SYS_WAIT_H 1
#define PETSC_HAVE_LIMITS_H 1
#define PETSC_HAVE_SYS_UTSNAME_H 1
#define PETSC_HAVE_NETINET_IN_H 1
#define PETSC_HAVE_FENV_H 1
#define PETSC_HAVE_FLOAT_H 1
#define PETSC_HAVE_SEARCH_H 1
#define PETSC_HAVE_SYS_SYSINFO_H 1
#define PETSC_HAVE_SYS_RESOURCE_H 1
#define PETSC_HAVE_SYS_TIMES_H 1
#define PETSC_HAVE_NETDB_H 1
#define PETSC_HAVE_MALLOC_H 1
#define PETSC_HAVE_PWD_H 1
#define PETSC_HAVE_FCNTL_H 1
#define PETSC_HAVE_STRINGS_H 1
#define PETSC_HAVE_MEMORY_H 1
#define PETSC_TIME_WITH_SYS_TIME 1
#define PETSC_HAVE_SYS_TIME_H 1
#define PETSC_USING_F90 1
#define PETSC_HAVE_RTLD_NOW 1
#define PETSC_HAVE_RTLD_LOCAL 1
#define PETSC_HAVE_RTLD_LAZY 1
#define PETSC_C_STATIC_INLINE static inline
#define PETSC_HAVE_FORTRAN_UNDERSCORE 1
#define PETSC_HAVE_CXX_NAMESPACE 1
#define PETSC_HAVE_RTLD_GLOBAL 1
#define PETSC_C_RESTRICT  __restrict__
#define PETSC_CXX_RESTRICT  __restrict__
#define PETSC_CXX_STATIC_INLINE static inline
#define PETSC_HAVE_LIBCUBLAS 1
#define PETSC_HAVE_LIBCUDART 1
#define PETSC_HAVE_LIBDL 1
#define PETSC_HAVE_LIBFBLAS 1
#define PETSC_HAVE_LIBFLAPACK 1
#define PETSC_HAVE_ERF 1
#define PETSC_HAVE_LIBCUFFT 1
#define PETSC_HAVE_LIBRT 1
#define PETSC_ARCH "arch-linux-gnu-c-debug"
#define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100"
#define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6"
#define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e"
#define PETSC_DIR "/home/kuba/External/petsc-dev"
#define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600"
#define HAVE_GZIP 1
#define PETSC_CLANGUAGE_C 1
#define PETSC_USE_EXTERN_CXX  
#define PETSC_USE_ERRORCHECKING 1
#define PETSC_MISSING_DREAL 1
#define PETSC_SIZEOF_MPI_COMM 4
#define PETSC_BITS_PER_BYTE 8
#define PETSC_SIZEOF_MPI_FINT 4
#define PETSC_SIZEOF_VOID_P 4
#define PETSC_RETSIGTYPE void
#define PETSC_HAVE_CXX_COMPLEX 1
#define PETSC_SIZEOF_LONG 4
#define PETSC_USE_FORTRANKIND 1
#define PETSC_SIZEOF_SIZE_T 4
#define PETSC_SIZEOF_CHAR 1
#define PETSC_SIZEOF_DOUBLE 8
#define PETSC_SIZEOF_FLOAT 4
#define PETSC_HAVE_C99_COMPLEX 1
#define PETSC_SIZEOF_INT 4
#define PETSC_SIZEOF_LONG_LONG 8
#define PETSC_SIZEOF_SHORT 2
#define PETSC_HAVE_STRCASECMP 1
#define PETSC_HAVE_POPEN 1
#define PETSC_HAVE_SIGSET 1
#define PETSC_HAVE_GETWD 1
#define PETSC_HAVE_VSNPRINTF 1
#define PETSC_HAVE_TIMES 1
#define PETSC_HAVE_DLSYM 1
#define PETSC_HAVE_SNPRINTF 1
#define PETSC_HAVE_GETPWUID 1
#define PETSC_HAVE_GETHOSTBYNAME 1
#define PETSC_HAVE_SLEEP 1
#define PETSC_HAVE_DLERROR 1
#define PETSC_HAVE_FORK 1
#define PETSC_HAVE_RAND 1
#define PETSC_HAVE_GETTIMEOFDAY 1
#define PETSC_HAVE_DLCLOSE 1
#define PETSC_HAVE_UNAME 1
#define PETSC_HAVE_GETHOSTNAME 1
#define PETSC_HAVE_MKSTEMP 1
#define PETSC_HAVE_SIGACTION 1
#define PETSC_HAVE_DRAND48 1
#define PETSC_HAVE_NANOSLEEP 1
#define PETSC_HAVE_VA_COPY 1
#define PETSC_HAVE_CLOCK 1
#define PETSC_HAVE_ACCESS 1
#define PETSC_HAVE_SIGNAL 1
#define PETSC_HAVE_USLEEP 1
#define PETSC_HAVE_GETRUSAGE 1
#define PETSC_HAVE_VFPRINTF 1
#define PETSC_HAVE_MEMALIGN 1
#define PETSC_HAVE_GETDOMAINNAME 1
#define PETSC_HAVE_TIME 1
#define PETSC_HAVE_LSEEK 1
#define PETSC_HAVE_SOCKET 1
#define PETSC_HAVE_SYSINFO 1
#define PETSC_HAVE_READLINK 1
#define PETSC_HAVE_REALPATH 1
#define PETSC_HAVE_DLOPEN 1
#define PETSC_HAVE_MEMMOVE 1
#define PETSC_HAVE__GFORTRAN_IARGC 1
#define PETSC_SIGNAL_CAST  
#define PETSC_HAVE_GETCWD 1
#define PETSC_HAVE_VPRINTF 1
#define PETSC_HAVE_BZERO 1
#define PETSC_HAVE_GETPAGESIZE 1
#define PETSC_LEVEL1_DCACHE_LINESIZE 64
#define PETSC_LEVEL1_DCACHE_SIZE 32768
#define PETSC_LEVEL1_DCACHE_ASSOC 8
#define PETSC_USE_PROC_FOR_SIZE 1
#define PETSC_HAVE_DYNAMIC_LIBRARIES 1
#define PETSC_HAVE_SHARED_LIBRARIES 1
#define PETSC_MEMALIGN 16
#define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1
#define PETSC_HAVE_GFORTRAN_IARGC 1
#define PETSC_HAVE_ISINF 1
#define PETSC_HAVE_ISNAN 1
#define PETSC_HAVE_MPI_COMM_C2F 1
#define PETSC_HAVE_MPI_LONG_DOUBLE 1
#define PETSC_HAVE_MPI_COMM_F2C 1
#define PETSC_HAVE_MPI_FINT 1
#define PETSC_HAVE_MPI_F90MODULE 1
#define PETSC_HAVE_MPI_FINALIZED 1
#define PETSC_HAVE_MPI_COMM_SPAWN 1
#define PETSC_HAVE_MPI_WIN_CREATE 1
#define PETSC_HAVE_MPIIO 1
#define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1
#define PETSC_HAVE_MPI_ALLTOALLW 1
#define PETSC_HAVE_MPI_IN_PLACE 1
#define PETSC_USE_INFO 1
#define PETSC_PETSC_USE_BACKWARD_LOOP 1
#define PETSC_Alignx(a,b)   
#define PETSC_USE_DEBUG 1
#define PETSC_USE_LOG 1
#define PETSC_IS_COLOR_VALUE_TYPE short
#define PETSC_USE_CTABLE 1
#define PETSC_USE_GDB_DEBUGGER 1
#define PETSC_CUDA_EXTERN_C_BEGIN extern "C" {
#define PETSC_CUDA_EXTERN_C_END }
#define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1
#define PETSC_BLASLAPACK_UNDERSCORE 1
-----------------------------------------
Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include
-I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
-I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
-I/usr/local/cuda/include/thrust/
Using C/C++
compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3   
Using Fortran include/module paths:
-I/home/kuba/External/petsc-dev/include
-I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
-I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
-I/usr/local/cuda/include/thrust/
Using Fortran
compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g   
-----------------------------------------
Using C/C++
linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -g3
Using Fortran
linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90
Using Fortran flags: -Wall -Wno-unused-variable -g
-----------------------------------------
Using libraries:
-L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib  -lpetsc
-lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft
-lcublas -lcudart
-Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
-lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5
-L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread
-lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
-lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
------------------------------------------
Using
mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec
==========================================
/bin/rm -f
-rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.*
/bin/rm -f
-f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod
BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES
=========================================
libfast in: /home/kuba/External/petsc-dev/src
libfast in: /home/kuba/External/petsc-dev/src/inline
libfast in: /home/kuba/External/petsc-dev/src/sys
libfast in: /home/kuba/External/petsc-dev/src/sys/viewer
libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls
libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket
In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
                 from /usr/local/cuda/include/cusp/memory.h:20,

from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
                 from send.c:3:
/usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?,
?;?, ?asm? or ?__attribute__? before ?thrust?
In file included from /usr/local/cuda/include/cusp/memory.h:22,

from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
                 from send.c:3:
/usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
error: iterator: No such file or directory
compilation terminated.


Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze:
> On Thu, Dec 9, 2010 at 20:44, Jakub Pola <jakub.pola at gmail.com> wrote:
>         Petsc Release Version 3.1.0
> 
> You need petsc-dev for this.


From balay at mcs.anl.gov  Thu Dec  9 17:29:11 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 9 Dec 2010 17:29:11 -0600 (CST)
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <1291936065.2227.31.camel@desktop>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
Message-ID: <alpine.LFD.2.02.1012091725470.32347@localhost6.localdomain6>

Try removing options:
--with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust
[and use --with-cups=1 --with-thrust=1]

As they are in default location - configure picks them automatically. But
since they are incorrectly specified - the 'default cuda path'  gets
the configure tests going - but this path causes grief later..

> In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
>                  from /usr/local/cuda/include/cusp/memory.h:20,


Thats suspporsed to system memory.h - not cups/memory.h

Satish

On Fri, 10 Dec 2010, Jakub Pola wrote:

> Hi again,
> 
> I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0
> I have problems with compiling the library. I did following steps:
> 
> configured pets: 
> 
> ./config/configure.py --with-cc=gcc --with-fc=gfortran
> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
> --with-thrust-dir=/usr/local/cuda/include/thrust
> 
> then make:
> make PETSC_DIR=/home/kuba/External/petsc-dev
> PETSC_ARCH=arch-linux-gnu-c-debug all
> 
> here I have problems because compilator says that 
> /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
> error: iterator: No such file or directory
> 
> But actually it is as a symbolic link:
>  ls -l /usr/local/cuda/include/ 
> gives 
> lrwxrwxrwx 1 root root     34 2010-12-09 23:44 thrust
> -> /home/kuba/External/thrust/thrust/
> 
> and 
> 
> kuba at desktop:~/External/thrust/thrust/iterator$ ls -l
> razem 96
> -rw-r--r-- 1 kuba kuba  7666 2010-12-09 21:30 constant_iterator.h
> -rw-r--r-- 1 kuba kuba  6959 2010-12-09 21:30 counting_iterator.h
> drwxr-xr-x 3 kuba kuba  4096 2010-12-09 21:30 detail
> -rw-r--r-- 1 kuba kuba  4376 2010-12-09 21:30 iterator_adaptor.h
> -rw-r--r-- 1 kuba kuba  8282 2010-12-09 21:30 iterator_categories.h
> -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h
> -rw-r--r-- 1 kuba kuba  2066 2010-12-09 21:30 iterator_traits.h
> -rw-r--r-- 1 kuba kuba  6880 2010-12-09 21:30 permutation_iterator.h
> -rw-r--r-- 1 kuba kuba  7055 2010-12-09 21:30 reverse_iterator.h
> -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h
> -rw-r--r-- 1 kuba kuba  7348 2010-12-09 21:30 zip_iterator.h
> kuba at desktop:~/External/thrust/thrust/iterator$ 
> 
> 
> Have somebody faced this kind of problem?
> 
> 
> 
> 
> Here it is compilation log to first error
> 
> kuba at desktop:~/External/petsc-dev$ make
> PETSC_DIR=/home/kuba/External/petsc-dev
> PETSC_ARCH=arch-linux-gnu-c-debug all
> ==========================================
>  
> See documentation/faq.html and documentation/bugreporting.html
> for help with installation problems. Please send EVERYTHING
> printed out below when reporting problems
>  
> To subscribe to the PETSc announcement list, send mail to 
> majordomo at mcs.anl.gov with the message: 
> subscribe petsc-announce
>  
> To subscribe to the PETSc users mailing list, send mail to 
> majordomo at mcs.anl.gov with the message: 
> subscribe petsc-users
>  
> ==========================================
> On czw, 9 gru 2010, 23:56:38 CET on desktop
> Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP
> Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux
> -----------------------------------------
> Using PETSc directory: /home/kuba/External/petsc-dev
> Using PETSc arch: arch-linux-gnu-c-debug
> -----------------------------------------
> PETSC_VERSION_RELEASE    0
> PETSC_VERSION_MAJOR      3
> PETSC_VERSION_MINOR      1
> PETSC_VERSION_SUBMINOR   0
> PETSC_VERSION_PATCH      6
> PETSC_VERSION_DATE       "Mar, 25, 2010"
> PETSC_VERSION_PATCH_DATE "unknown"
> PETSC_VERSION_HG         "unknown"
> PETSC_VERSION_DATE_HG    "unknown"
> PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \
> -----------------------------------------
> Using configure Options: --with-cc=gcc --with-fc=gfortran
> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
> --with-thrust-dir=/usr/local/cuda/include/thrust
> Using configuration flags:
> #define INCLUDED_PETSCCONF_H
> #define IS_COLORING_MAX 65535
> #define STDC_HEADERS 1
> #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT
> #define PETSC_UINTPTR_T uintptr_t
> #define PETSC_HAVE_PTHREAD 1
> #define PETSC_STATIC_INLINE static inline
> #define PETSC_REPLACE_DIR_SEPARATOR '\\'
> #define PETSC_RESTRICT  __restrict__
> #define PETSC_HAVE_MPI 1
> #define PETSC_USE_SINGLE_LIBRARY 1
> #define PETSC_USE_SOCKET_VIEWER 1
> #define PETSC_HAVE_THRUST 1
> #define PETSC_LIB_DIR
> "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib"
> #define PETSC_HAVE_FORTRAN 1
> #define PETSC_HAVE_SOWING 1
> #define PETSC_SLSUFFIX ""
> #define PETSC_FUNCTION_NAME_CXX __func__
> #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1
> #define PETSC_UNUSED  
> #define PETSC_HAVE_CUDA 1
> #define PETSC_FUNCTION_NAME_C __func__
> #define PETSC_HAVE_C2HTML 1
> #define PETSC_HAVE_VALGRIND 1
> #define PETSC_HAVE_BUILTIN_EXPECT 1
> #define PETSC_DIR_SEPARATOR '/'
> #define PETSC_PATH_SEPARATOR ':'
> #define PETSC_HAVE_X11 1
> #define PETSC_HAVE_CUSP 1
> #define PETSC_Prefetch(a,b,c)  
> #define PETSC_HAVE_BLASLAPACK 1
> #define PETSC_HAVE_STRING_H 1
> #define PETSC_HAVE_SYS_TYPES_H 1
> #define PETSC_HAVE_ENDIAN_H 1
> #define PETSC_HAVE_SYS_PROCFS_H 1
> #define PETSC_HAVE_DLFCN_H 1
> #define PETSC_HAVE_STDINT_H 1
> #define PETSC_HAVE_LINUX_KERNEL_H 1
> #define PETSC_HAVE_TIME_H 1
> #define PETSC_HAVE_MATH_H 1
> #define PETSC_HAVE_STDLIB_H 1
> #define PETSC_HAVE_SYS_PARAM_H 1
> #define PETSC_HAVE_SYS_SOCKET_H 1
> #define PETSC_HAVE_UNISTD_H 1
> #define PETSC_HAVE_SYS_WAIT_H 1
> #define PETSC_HAVE_LIMITS_H 1
> #define PETSC_HAVE_SYS_UTSNAME_H 1
> #define PETSC_HAVE_NETINET_IN_H 1
> #define PETSC_HAVE_FENV_H 1
> #define PETSC_HAVE_FLOAT_H 1
> #define PETSC_HAVE_SEARCH_H 1
> #define PETSC_HAVE_SYS_SYSINFO_H 1
> #define PETSC_HAVE_SYS_RESOURCE_H 1
> #define PETSC_HAVE_SYS_TIMES_H 1
> #define PETSC_HAVE_NETDB_H 1
> #define PETSC_HAVE_MALLOC_H 1
> #define PETSC_HAVE_PWD_H 1
> #define PETSC_HAVE_FCNTL_H 1
> #define PETSC_HAVE_STRINGS_H 1
> #define PETSC_HAVE_MEMORY_H 1
> #define PETSC_TIME_WITH_SYS_TIME 1
> #define PETSC_HAVE_SYS_TIME_H 1
> #define PETSC_USING_F90 1
> #define PETSC_HAVE_RTLD_NOW 1
> #define PETSC_HAVE_RTLD_LOCAL 1
> #define PETSC_HAVE_RTLD_LAZY 1
> #define PETSC_C_STATIC_INLINE static inline
> #define PETSC_HAVE_FORTRAN_UNDERSCORE 1
> #define PETSC_HAVE_CXX_NAMESPACE 1
> #define PETSC_HAVE_RTLD_GLOBAL 1
> #define PETSC_C_RESTRICT  __restrict__
> #define PETSC_CXX_RESTRICT  __restrict__
> #define PETSC_CXX_STATIC_INLINE static inline
> #define PETSC_HAVE_LIBCUBLAS 1
> #define PETSC_HAVE_LIBCUDART 1
> #define PETSC_HAVE_LIBDL 1
> #define PETSC_HAVE_LIBFBLAS 1
> #define PETSC_HAVE_LIBFLAPACK 1
> #define PETSC_HAVE_ERF 1
> #define PETSC_HAVE_LIBCUFFT 1
> #define PETSC_HAVE_LIBRT 1
> #define PETSC_ARCH "arch-linux-gnu-c-debug"
> #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100"
> #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6"
> #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e"
> #define PETSC_DIR "/home/kuba/External/petsc-dev"
> #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600"
> #define HAVE_GZIP 1
> #define PETSC_CLANGUAGE_C 1
> #define PETSC_USE_EXTERN_CXX  
> #define PETSC_USE_ERRORCHECKING 1
> #define PETSC_MISSING_DREAL 1
> #define PETSC_SIZEOF_MPI_COMM 4
> #define PETSC_BITS_PER_BYTE 8
> #define PETSC_SIZEOF_MPI_FINT 4
> #define PETSC_SIZEOF_VOID_P 4
> #define PETSC_RETSIGTYPE void
> #define PETSC_HAVE_CXX_COMPLEX 1
> #define PETSC_SIZEOF_LONG 4
> #define PETSC_USE_FORTRANKIND 1
> #define PETSC_SIZEOF_SIZE_T 4
> #define PETSC_SIZEOF_CHAR 1
> #define PETSC_SIZEOF_DOUBLE 8
> #define PETSC_SIZEOF_FLOAT 4
> #define PETSC_HAVE_C99_COMPLEX 1
> #define PETSC_SIZEOF_INT 4
> #define PETSC_SIZEOF_LONG_LONG 8
> #define PETSC_SIZEOF_SHORT 2
> #define PETSC_HAVE_STRCASECMP 1
> #define PETSC_HAVE_POPEN 1
> #define PETSC_HAVE_SIGSET 1
> #define PETSC_HAVE_GETWD 1
> #define PETSC_HAVE_VSNPRINTF 1
> #define PETSC_HAVE_TIMES 1
> #define PETSC_HAVE_DLSYM 1
> #define PETSC_HAVE_SNPRINTF 1
> #define PETSC_HAVE_GETPWUID 1
> #define PETSC_HAVE_GETHOSTBYNAME 1
> #define PETSC_HAVE_SLEEP 1
> #define PETSC_HAVE_DLERROR 1
> #define PETSC_HAVE_FORK 1
> #define PETSC_HAVE_RAND 1
> #define PETSC_HAVE_GETTIMEOFDAY 1
> #define PETSC_HAVE_DLCLOSE 1
> #define PETSC_HAVE_UNAME 1
> #define PETSC_HAVE_GETHOSTNAME 1
> #define PETSC_HAVE_MKSTEMP 1
> #define PETSC_HAVE_SIGACTION 1
> #define PETSC_HAVE_DRAND48 1
> #define PETSC_HAVE_NANOSLEEP 1
> #define PETSC_HAVE_VA_COPY 1
> #define PETSC_HAVE_CLOCK 1
> #define PETSC_HAVE_ACCESS 1
> #define PETSC_HAVE_SIGNAL 1
> #define PETSC_HAVE_USLEEP 1
> #define PETSC_HAVE_GETRUSAGE 1
> #define PETSC_HAVE_VFPRINTF 1
> #define PETSC_HAVE_MEMALIGN 1
> #define PETSC_HAVE_GETDOMAINNAME 1
> #define PETSC_HAVE_TIME 1
> #define PETSC_HAVE_LSEEK 1
> #define PETSC_HAVE_SOCKET 1
> #define PETSC_HAVE_SYSINFO 1
> #define PETSC_HAVE_READLINK 1
> #define PETSC_HAVE_REALPATH 1
> #define PETSC_HAVE_DLOPEN 1
> #define PETSC_HAVE_MEMMOVE 1
> #define PETSC_HAVE__GFORTRAN_IARGC 1
> #define PETSC_SIGNAL_CAST  
> #define PETSC_HAVE_GETCWD 1
> #define PETSC_HAVE_VPRINTF 1
> #define PETSC_HAVE_BZERO 1
> #define PETSC_HAVE_GETPAGESIZE 1
> #define PETSC_LEVEL1_DCACHE_LINESIZE 64
> #define PETSC_LEVEL1_DCACHE_SIZE 32768
> #define PETSC_LEVEL1_DCACHE_ASSOC 8
> #define PETSC_USE_PROC_FOR_SIZE 1
> #define PETSC_HAVE_DYNAMIC_LIBRARIES 1
> #define PETSC_HAVE_SHARED_LIBRARIES 1
> #define PETSC_MEMALIGN 16
> #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1
> #define PETSC_HAVE_GFORTRAN_IARGC 1
> #define PETSC_HAVE_ISINF 1
> #define PETSC_HAVE_ISNAN 1
> #define PETSC_HAVE_MPI_COMM_C2F 1
> #define PETSC_HAVE_MPI_LONG_DOUBLE 1
> #define PETSC_HAVE_MPI_COMM_F2C 1
> #define PETSC_HAVE_MPI_FINT 1
> #define PETSC_HAVE_MPI_F90MODULE 1
> #define PETSC_HAVE_MPI_FINALIZED 1
> #define PETSC_HAVE_MPI_COMM_SPAWN 1
> #define PETSC_HAVE_MPI_WIN_CREATE 1
> #define PETSC_HAVE_MPIIO 1
> #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1
> #define PETSC_HAVE_MPI_ALLTOALLW 1
> #define PETSC_HAVE_MPI_IN_PLACE 1
> #define PETSC_USE_INFO 1
> #define PETSC_PETSC_USE_BACKWARD_LOOP 1
> #define PETSC_Alignx(a,b)   
> #define PETSC_USE_DEBUG 1
> #define PETSC_USE_LOG 1
> #define PETSC_IS_COLOR_VALUE_TYPE short
> #define PETSC_USE_CTABLE 1
> #define PETSC_USE_GDB_DEBUGGER 1
> #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" {
> #define PETSC_CUDA_EXTERN_C_END }
> #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1
> #define PETSC_BLASLAPACK_UNDERSCORE 1
> -----------------------------------------
> Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include
> -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
> -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
> -I/usr/local/cuda/include/thrust/
> Using C/C++
> compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3   
> Using Fortran include/module paths:
> -I/home/kuba/External/petsc-dev/include
> -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
> -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
> -I/usr/local/cuda/include/thrust/
> Using Fortran
> compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g   
> -----------------------------------------
> Using C/C++
> linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
> Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -g3
> Using Fortran
> linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90
> Using Fortran flags: -Wall -Wno-unused-variable -g
> -----------------------------------------
> Using libraries:
> -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib  -lpetsc
> -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft
> -lcublas -lcudart
> -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
> -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5
> -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread
> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
> -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
> ------------------------------------------
> Using
> mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec
> ==========================================
> /bin/rm -f
> -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.*
> /bin/rm -f
> -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod
> BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES
> =========================================
> libfast in: /home/kuba/External/petsc-dev/src
> libfast in: /home/kuba/External/petsc-dev/src/inline
> libfast in: /home/kuba/External/petsc-dev/src/sys
> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer
> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls
> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket
> In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
>                  from /usr/local/cuda/include/cusp/memory.h:20,
> 
> from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
>                  from send.c:3:
> /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?,
> ?;?, ?asm? or ?__attribute__? before ?thrust?
> In file included from /usr/local/cuda/include/cusp/memory.h:22,
> 
> from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
>                  from send.c:3:
> /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
> error: iterator: No such file or directory
> compilation terminated.
> 
> 
> 
> Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze:
> > On Thu, Dec 9, 2010 at 20:44, Jakub Pola <jakub.pola at gmail.com> wrote:
> >         Petsc Release Version 3.1.0
> > 
> > You need petsc-dev for this.
> 
> 
> 

From jed at 59A2.org  Thu Dec  9 17:38:52 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 10 Dec 2010 00:38:52 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <1291936065.2227.31.camel@desktop>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
Message-ID: <AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>

On Fri, Dec 10, 2010 at 00:07, Jakub Pola <jakub.pola at gmail.com> wrote:

> I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0


As I said before, you need petsc-dev for this, CUDA support is not in 3.1.

http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/d36a1ac8/attachment.htm>

From jakub.pola at gmail.com  Thu Dec  9 17:44:12 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Fri, 10 Dec 2010 00:44:12 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <alpine.LFD.2.02.1012091725470.32347@localhost6.localdomain6>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
	<alpine.LFD.2.02.1012091725470.32347@localhost6.localdomain6>
Message-ID: <1291938252.2227.33.camel@desktop>

When I used your suggestion I got following error.

kuba at desktop:~/External/petsc-dev$ ./config/configure.py --with-cc=gcc
--with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1
--with-cuda=1 --with-debug=no --with-cups=1 --with-thrust=1
===============================================================================
             Configuring PETSc to compile on your
system                       
===============================================================================
TESTING: checkInclude from
config.headers(config/BuildSystem/config/headers.py:82)
*******************************************************************************
         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log
for details):
-------------------------------------------------------------------------------
PETSc CUDA support requires the CUSP and Thrust packages
Rerun configure using --with-cusp-dir and --with-thrust-dir
*******************************************************************************


Dnia 2010-12-09, czw o godzinie 17:29 -0600, Satish Balay pisze:
> Try removing options:
> --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust
> [and use --with-cups=1 --with-thrust=1]
> 
> As they are in default location - configure picks them automatically. But
> since they are incorrectly specified - the 'default cuda path'  gets
> the configure tests going - but this path causes grief later..
> 
> > In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
> >                  from /usr/local/cuda/include/cusp/memory.h:20,
> 
> 
> Thats suspporsed to system memory.h - not cups/memory.h
> 
> Satish
> 
> On Fri, 10 Dec 2010, Jakub Pola wrote:
> 
> > Hi again,
> > 
> > I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0
> > I have problems with compiling the library. I did following steps:
> > 
> > configured pets: 
> > 
> > ./config/configure.py --with-cc=gcc --with-fc=gfortran
> > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> > --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
> > --with-thrust-dir=/usr/local/cuda/include/thrust
> > 
> > then make:
> > make PETSC_DIR=/home/kuba/External/petsc-dev
> > PETSC_ARCH=arch-linux-gnu-c-debug all
> > 
> > here I have problems because compilator says that 
> > /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
> > error: iterator: No such file or directory
> > 
> > But actually it is as a symbolic link:
> >  ls -l /usr/local/cuda/include/ 
> > gives 
> > lrwxrwxrwx 1 root root     34 2010-12-09 23:44 thrust
> > -> /home/kuba/External/thrust/thrust/
> > 
> > and 
> > 
> > kuba at desktop:~/External/thrust/thrust/iterator$ ls -l
> > razem 96
> > -rw-r--r-- 1 kuba kuba  7666 2010-12-09 21:30 constant_iterator.h
> > -rw-r--r-- 1 kuba kuba  6959 2010-12-09 21:30 counting_iterator.h
> > drwxr-xr-x 3 kuba kuba  4096 2010-12-09 21:30 detail
> > -rw-r--r-- 1 kuba kuba  4376 2010-12-09 21:30 iterator_adaptor.h
> > -rw-r--r-- 1 kuba kuba  8282 2010-12-09 21:30 iterator_categories.h
> > -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h
> > -rw-r--r-- 1 kuba kuba  2066 2010-12-09 21:30 iterator_traits.h
> > -rw-r--r-- 1 kuba kuba  6880 2010-12-09 21:30 permutation_iterator.h
> > -rw-r--r-- 1 kuba kuba  7055 2010-12-09 21:30 reverse_iterator.h
> > -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h
> > -rw-r--r-- 1 kuba kuba  7348 2010-12-09 21:30 zip_iterator.h
> > kuba at desktop:~/External/thrust/thrust/iterator$ 
> > 
> > 
> > Have somebody faced this kind of problem?
> > 
> > 
> > 
> > 
> > Here it is compilation log to first error
> > 
> > kuba at desktop:~/External/petsc-dev$ make
> > PETSC_DIR=/home/kuba/External/petsc-dev
> > PETSC_ARCH=arch-linux-gnu-c-debug all
> > ==========================================
> >  
> > See documentation/faq.html and documentation/bugreporting.html
> > for help with installation problems. Please send EVERYTHING
> > printed out below when reporting problems
> >  
> > To subscribe to the PETSc announcement list, send mail to 
> > majordomo at mcs.anl.gov with the message: 
> > subscribe petsc-announce
> >  
> > To subscribe to the PETSc users mailing list, send mail to 
> > majordomo at mcs.anl.gov with the message: 
> > subscribe petsc-users
> >  
> > ==========================================
> > On czw, 9 gru 2010, 23:56:38 CET on desktop
> > Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP
> > Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux
> > -----------------------------------------
> > Using PETSc directory: /home/kuba/External/petsc-dev
> > Using PETSc arch: arch-linux-gnu-c-debug
> > -----------------------------------------
> > PETSC_VERSION_RELEASE    0
> > PETSC_VERSION_MAJOR      3
> > PETSC_VERSION_MINOR      1
> > PETSC_VERSION_SUBMINOR   0
> > PETSC_VERSION_PATCH      6
> > PETSC_VERSION_DATE       "Mar, 25, 2010"
> > PETSC_VERSION_PATCH_DATE "unknown"
> > PETSC_VERSION_HG         "unknown"
> > PETSC_VERSION_DATE_HG    "unknown"
> > PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \
> > -----------------------------------------
> > Using configure Options: --with-cc=gcc --with-fc=gfortran
> > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> > --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
> > --with-thrust-dir=/usr/local/cuda/include/thrust
> > Using configuration flags:
> > #define INCLUDED_PETSCCONF_H
> > #define IS_COLORING_MAX 65535
> > #define STDC_HEADERS 1
> > #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT
> > #define PETSC_UINTPTR_T uintptr_t
> > #define PETSC_HAVE_PTHREAD 1
> > #define PETSC_STATIC_INLINE static inline
> > #define PETSC_REPLACE_DIR_SEPARATOR '\\'
> > #define PETSC_RESTRICT  __restrict__
> > #define PETSC_HAVE_MPI 1
> > #define PETSC_USE_SINGLE_LIBRARY 1
> > #define PETSC_USE_SOCKET_VIEWER 1
> > #define PETSC_HAVE_THRUST 1
> > #define PETSC_LIB_DIR
> > "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib"
> > #define PETSC_HAVE_FORTRAN 1
> > #define PETSC_HAVE_SOWING 1
> > #define PETSC_SLSUFFIX ""
> > #define PETSC_FUNCTION_NAME_CXX __func__
> > #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1
> > #define PETSC_UNUSED  
> > #define PETSC_HAVE_CUDA 1
> > #define PETSC_FUNCTION_NAME_C __func__
> > #define PETSC_HAVE_C2HTML 1
> > #define PETSC_HAVE_VALGRIND 1
> > #define PETSC_HAVE_BUILTIN_EXPECT 1
> > #define PETSC_DIR_SEPARATOR '/'
> > #define PETSC_PATH_SEPARATOR ':'
> > #define PETSC_HAVE_X11 1
> > #define PETSC_HAVE_CUSP 1
> > #define PETSC_Prefetch(a,b,c)  
> > #define PETSC_HAVE_BLASLAPACK 1
> > #define PETSC_HAVE_STRING_H 1
> > #define PETSC_HAVE_SYS_TYPES_H 1
> > #define PETSC_HAVE_ENDIAN_H 1
> > #define PETSC_HAVE_SYS_PROCFS_H 1
> > #define PETSC_HAVE_DLFCN_H 1
> > #define PETSC_HAVE_STDINT_H 1
> > #define PETSC_HAVE_LINUX_KERNEL_H 1
> > #define PETSC_HAVE_TIME_H 1
> > #define PETSC_HAVE_MATH_H 1
> > #define PETSC_HAVE_STDLIB_H 1
> > #define PETSC_HAVE_SYS_PARAM_H 1
> > #define PETSC_HAVE_SYS_SOCKET_H 1
> > #define PETSC_HAVE_UNISTD_H 1
> > #define PETSC_HAVE_SYS_WAIT_H 1
> > #define PETSC_HAVE_LIMITS_H 1
> > #define PETSC_HAVE_SYS_UTSNAME_H 1
> > #define PETSC_HAVE_NETINET_IN_H 1
> > #define PETSC_HAVE_FENV_H 1
> > #define PETSC_HAVE_FLOAT_H 1
> > #define PETSC_HAVE_SEARCH_H 1
> > #define PETSC_HAVE_SYS_SYSINFO_H 1
> > #define PETSC_HAVE_SYS_RESOURCE_H 1
> > #define PETSC_HAVE_SYS_TIMES_H 1
> > #define PETSC_HAVE_NETDB_H 1
> > #define PETSC_HAVE_MALLOC_H 1
> > #define PETSC_HAVE_PWD_H 1
> > #define PETSC_HAVE_FCNTL_H 1
> > #define PETSC_HAVE_STRINGS_H 1
> > #define PETSC_HAVE_MEMORY_H 1
> > #define PETSC_TIME_WITH_SYS_TIME 1
> > #define PETSC_HAVE_SYS_TIME_H 1
> > #define PETSC_USING_F90 1
> > #define PETSC_HAVE_RTLD_NOW 1
> > #define PETSC_HAVE_RTLD_LOCAL 1
> > #define PETSC_HAVE_RTLD_LAZY 1
> > #define PETSC_C_STATIC_INLINE static inline
> > #define PETSC_HAVE_FORTRAN_UNDERSCORE 1
> > #define PETSC_HAVE_CXX_NAMESPACE 1
> > #define PETSC_HAVE_RTLD_GLOBAL 1
> > #define PETSC_C_RESTRICT  __restrict__
> > #define PETSC_CXX_RESTRICT  __restrict__
> > #define PETSC_CXX_STATIC_INLINE static inline
> > #define PETSC_HAVE_LIBCUBLAS 1
> > #define PETSC_HAVE_LIBCUDART 1
> > #define PETSC_HAVE_LIBDL 1
> > #define PETSC_HAVE_LIBFBLAS 1
> > #define PETSC_HAVE_LIBFLAPACK 1
> > #define PETSC_HAVE_ERF 1
> > #define PETSC_HAVE_LIBCUFFT 1
> > #define PETSC_HAVE_LIBRT 1
> > #define PETSC_ARCH "arch-linux-gnu-c-debug"
> > #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100"
> > #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6"
> > #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e"
> > #define PETSC_DIR "/home/kuba/External/petsc-dev"
> > #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600"
> > #define HAVE_GZIP 1
> > #define PETSC_CLANGUAGE_C 1
> > #define PETSC_USE_EXTERN_CXX  
> > #define PETSC_USE_ERRORCHECKING 1
> > #define PETSC_MISSING_DREAL 1
> > #define PETSC_SIZEOF_MPI_COMM 4
> > #define PETSC_BITS_PER_BYTE 8
> > #define PETSC_SIZEOF_MPI_FINT 4
> > #define PETSC_SIZEOF_VOID_P 4
> > #define PETSC_RETSIGTYPE void
> > #define PETSC_HAVE_CXX_COMPLEX 1
> > #define PETSC_SIZEOF_LONG 4
> > #define PETSC_USE_FORTRANKIND 1
> > #define PETSC_SIZEOF_SIZE_T 4
> > #define PETSC_SIZEOF_CHAR 1
> > #define PETSC_SIZEOF_DOUBLE 8
> > #define PETSC_SIZEOF_FLOAT 4
> > #define PETSC_HAVE_C99_COMPLEX 1
> > #define PETSC_SIZEOF_INT 4
> > #define PETSC_SIZEOF_LONG_LONG 8
> > #define PETSC_SIZEOF_SHORT 2
> > #define PETSC_HAVE_STRCASECMP 1
> > #define PETSC_HAVE_POPEN 1
> > #define PETSC_HAVE_SIGSET 1
> > #define PETSC_HAVE_GETWD 1
> > #define PETSC_HAVE_VSNPRINTF 1
> > #define PETSC_HAVE_TIMES 1
> > #define PETSC_HAVE_DLSYM 1
> > #define PETSC_HAVE_SNPRINTF 1
> > #define PETSC_HAVE_GETPWUID 1
> > #define PETSC_HAVE_GETHOSTBYNAME 1
> > #define PETSC_HAVE_SLEEP 1
> > #define PETSC_HAVE_DLERROR 1
> > #define PETSC_HAVE_FORK 1
> > #define PETSC_HAVE_RAND 1
> > #define PETSC_HAVE_GETTIMEOFDAY 1
> > #define PETSC_HAVE_DLCLOSE 1
> > #define PETSC_HAVE_UNAME 1
> > #define PETSC_HAVE_GETHOSTNAME 1
> > #define PETSC_HAVE_MKSTEMP 1
> > #define PETSC_HAVE_SIGACTION 1
> > #define PETSC_HAVE_DRAND48 1
> > #define PETSC_HAVE_NANOSLEEP 1
> > #define PETSC_HAVE_VA_COPY 1
> > #define PETSC_HAVE_CLOCK 1
> > #define PETSC_HAVE_ACCESS 1
> > #define PETSC_HAVE_SIGNAL 1
> > #define PETSC_HAVE_USLEEP 1
> > #define PETSC_HAVE_GETRUSAGE 1
> > #define PETSC_HAVE_VFPRINTF 1
> > #define PETSC_HAVE_MEMALIGN 1
> > #define PETSC_HAVE_GETDOMAINNAME 1
> > #define PETSC_HAVE_TIME 1
> > #define PETSC_HAVE_LSEEK 1
> > #define PETSC_HAVE_SOCKET 1
> > #define PETSC_HAVE_SYSINFO 1
> > #define PETSC_HAVE_READLINK 1
> > #define PETSC_HAVE_REALPATH 1
> > #define PETSC_HAVE_DLOPEN 1
> > #define PETSC_HAVE_MEMMOVE 1
> > #define PETSC_HAVE__GFORTRAN_IARGC 1
> > #define PETSC_SIGNAL_CAST  
> > #define PETSC_HAVE_GETCWD 1
> > #define PETSC_HAVE_VPRINTF 1
> > #define PETSC_HAVE_BZERO 1
> > #define PETSC_HAVE_GETPAGESIZE 1
> > #define PETSC_LEVEL1_DCACHE_LINESIZE 64
> > #define PETSC_LEVEL1_DCACHE_SIZE 32768
> > #define PETSC_LEVEL1_DCACHE_ASSOC 8
> > #define PETSC_USE_PROC_FOR_SIZE 1
> > #define PETSC_HAVE_DYNAMIC_LIBRARIES 1
> > #define PETSC_HAVE_SHARED_LIBRARIES 1
> > #define PETSC_MEMALIGN 16
> > #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1
> > #define PETSC_HAVE_GFORTRAN_IARGC 1
> > #define PETSC_HAVE_ISINF 1
> > #define PETSC_HAVE_ISNAN 1
> > #define PETSC_HAVE_MPI_COMM_C2F 1
> > #define PETSC_HAVE_MPI_LONG_DOUBLE 1
> > #define PETSC_HAVE_MPI_COMM_F2C 1
> > #define PETSC_HAVE_MPI_FINT 1
> > #define PETSC_HAVE_MPI_F90MODULE 1
> > #define PETSC_HAVE_MPI_FINALIZED 1
> > #define PETSC_HAVE_MPI_COMM_SPAWN 1
> > #define PETSC_HAVE_MPI_WIN_CREATE 1
> > #define PETSC_HAVE_MPIIO 1
> > #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1
> > #define PETSC_HAVE_MPI_ALLTOALLW 1
> > #define PETSC_HAVE_MPI_IN_PLACE 1
> > #define PETSC_USE_INFO 1
> > #define PETSC_PETSC_USE_BACKWARD_LOOP 1
> > #define PETSC_Alignx(a,b)   
> > #define PETSC_USE_DEBUG 1
> > #define PETSC_USE_LOG 1
> > #define PETSC_IS_COLOR_VALUE_TYPE short
> > #define PETSC_USE_CTABLE 1
> > #define PETSC_USE_GDB_DEBUGGER 1
> > #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" {
> > #define PETSC_CUDA_EXTERN_C_END }
> > #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1
> > #define PETSC_BLASLAPACK_UNDERSCORE 1
> > -----------------------------------------
> > Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include
> > -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
> > -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
> > -I/usr/local/cuda/include/thrust/
> > Using C/C++
> > compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
> > -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3   
> > Using Fortran include/module paths:
> > -I/home/kuba/External/petsc-dev/include
> > -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
> > -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
> > -I/usr/local/cuda/include/thrust/
> > Using Fortran
> > compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g   
> > -----------------------------------------
> > Using C/C++
> > linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
> > Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
> > -Wno-unknown-pragmas -g3
> > Using Fortran
> > linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90
> > Using Fortran flags: -Wall -Wno-unused-variable -g
> > -----------------------------------------
> > Using libraries:
> > -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib  -lpetsc
> > -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft
> > -lcublas -lcudart
> > -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
> > -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5
> > -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread
> > -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
> > -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
> > ------------------------------------------
> > Using
> > mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec
> > ==========================================
> > /bin/rm -f
> > -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.*
> > /bin/rm -f
> > -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod
> > BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES
> > =========================================
> > libfast in: /home/kuba/External/petsc-dev/src
> > libfast in: /home/kuba/External/petsc-dev/src/inline
> > libfast in: /home/kuba/External/petsc-dev/src/sys
> > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer
> > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls
> > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket
> > In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
> >                  from /usr/local/cuda/include/cusp/memory.h:20,
> > 
> > from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
> >                  from send.c:3:
> > /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?,
> > ?;?, ?asm? or ?__attribute__? before ?thrust?
> > In file included from /usr/local/cuda/include/cusp/memory.h:22,
> > 
> > from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
> >                  from send.c:3:
> > /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
> > error: iterator: No such file or directory
> > compilation terminated.
> > 
> > 
> > 
> > Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze:
> > > On Thu, Dec 9, 2010 at 20:44, Jakub Pola <jakub.pola at gmail.com> wrote:
> > >         Petsc Release Version 3.1.0
> > > 
> > > You need petsc-dev for this.
> > 
> > 
> > 


From jakub.pola at gmail.com  Thu Dec  9 17:52:03 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Fri, 10 Dec 2010 00:52:03 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
	<AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
Message-ID: <1291938723.2227.37.camel@desktop>

Sorry I made mistake in writing  It should be petsc-dev instead of 3.1.0

Dnia 2010-12-10, pi? o godzinie 00:38 +0100, Jed Brown pisze:
> On Fri, Dec 10, 2010 at 00:07, Jakub Pola <jakub.pola at gmail.com>
> wrote:
>         I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp
>         0.2.0
> 
> As I said before, you need petsc-dev for this, CUDA support is not in
> 3.1.
> 
> 
> http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining


From bsmith at mcs.anl.gov  Thu Dec  9 18:08:36 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 9 Dec 2010 18:08:36 -0600
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <1291938252.2227.33.camel@desktop>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
	<alpine.LFD.2.02.1012091725470.32347@localhost6.localdomain6>
	<1291938252.2227.33.camel@desktop>
Message-ID: <A0B9C883-EC89-43EB-B537-9137F157D68F@mcs.anl.gov>


On Dec 9, 2010, at 5:44 PM, Jakub Pola wrote:

> When I used your suggestion I got following error.
> 
> kuba at desktop:~/External/petsc-dev$ ./config/configure.py --with-cc=gcc
> --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1
> --with-cuda=1 --with-debug=no --with-cups=1 --with-thrust=1

    --with-cusp  NOT --with-cups

> ===============================================================================
>             Configuring PETSc to compile on your
> system                       
> ===============================================================================
> TESTING: checkInclude from
> config.headers(config/BuildSystem/config/headers.py:82)
> *******************************************************************************
>         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log
> for details):
> -------------------------------------------------------------------------------
> PETSc CUDA support requires the CUSP and Thrust packages
> Rerun configure using --with-cusp-dir and --with-thrust-dir
> *******************************************************************************
> 
> 
> Dnia 2010-12-09, czw o godzinie 17:29 -0600, Satish Balay pisze:
>> Try removing options:
>> --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust
>> [and use --with-cups=1 --with-thrust=1]
>> 
>> As they are in default location - configure picks them automatically. But
>> since they are incorrectly specified - the 'default cuda path'  gets
>> the configure tests going - but this path causes grief later..
>> 
>>> In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
>>>                 from /usr/local/cuda/include/cusp/memory.h:20,
>> 
>> 
>> Thats suspporsed to system memory.h - not cups/memory.h
>> 
>> Satish
>> 
>> On Fri, 10 Dec 2010, Jakub Pola wrote:
>> 
>>> Hi again,
>>> 
>>> I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0
>>> I have problems with compiling the library. I did following steps:
>>> 
>>> configured pets: 
>>> 
>>> ./config/configure.py --with-cc=gcc --with-fc=gfortran
>>> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
>>> --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
>>> --with-thrust-dir=/usr/local/cuda/include/thrust
>>> 
>>> then make:
>>> make PETSC_DIR=/home/kuba/External/petsc-dev
>>> PETSC_ARCH=arch-linux-gnu-c-debug all
>>> 
>>> here I have problems because compilator says that 
>>> /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
>>> error: iterator: No such file or directory
>>> 
>>> But actually it is as a symbolic link:
>>> ls -l /usr/local/cuda/include/ 
>>> gives 
>>> lrwxrwxrwx 1 root root     34 2010-12-09 23:44 thrust
>>> -> /home/kuba/External/thrust/thrust/
>>> 
>>> and 
>>> 
>>> kuba at desktop:~/External/thrust/thrust/iterator$ ls -l
>>> razem 96
>>> -rw-r--r-- 1 kuba kuba  7666 2010-12-09 21:30 constant_iterator.h
>>> -rw-r--r-- 1 kuba kuba  6959 2010-12-09 21:30 counting_iterator.h
>>> drwxr-xr-x 3 kuba kuba  4096 2010-12-09 21:30 detail
>>> -rw-r--r-- 1 kuba kuba  4376 2010-12-09 21:30 iterator_adaptor.h
>>> -rw-r--r-- 1 kuba kuba  8282 2010-12-09 21:30 iterator_categories.h
>>> -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h
>>> -rw-r--r-- 1 kuba kuba  2066 2010-12-09 21:30 iterator_traits.h
>>> -rw-r--r-- 1 kuba kuba  6880 2010-12-09 21:30 permutation_iterator.h
>>> -rw-r--r-- 1 kuba kuba  7055 2010-12-09 21:30 reverse_iterator.h
>>> -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h
>>> -rw-r--r-- 1 kuba kuba  7348 2010-12-09 21:30 zip_iterator.h
>>> kuba at desktop:~/External/thrust/thrust/iterator$ 
>>> 
>>> 
>>> Have somebody faced this kind of problem?
>>> 
>>> 
>>> 
>>> 
>>> Here it is compilation log to first error
>>> 
>>> kuba at desktop:~/External/petsc-dev$ make
>>> PETSC_DIR=/home/kuba/External/petsc-dev
>>> PETSC_ARCH=arch-linux-gnu-c-debug all
>>> ==========================================
>>> 
>>> See documentation/faq.html and documentation/bugreporting.html
>>> for help with installation problems. Please send EVERYTHING
>>> printed out below when reporting problems
>>> 
>>> To subscribe to the PETSc announcement list, send mail to 
>>> majordomo at mcs.anl.gov with the message: 
>>> subscribe petsc-announce
>>> 
>>> To subscribe to the PETSc users mailing list, send mail to 
>>> majordomo at mcs.anl.gov with the message: 
>>> subscribe petsc-users
>>> 
>>> ==========================================
>>> On czw, 9 gru 2010, 23:56:38 CET on desktop
>>> Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP
>>> Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux
>>> -----------------------------------------
>>> Using PETSc directory: /home/kuba/External/petsc-dev
>>> Using PETSc arch: arch-linux-gnu-c-debug
>>> -----------------------------------------
>>> PETSC_VERSION_RELEASE    0
>>> PETSC_VERSION_MAJOR      3
>>> PETSC_VERSION_MINOR      1
>>> PETSC_VERSION_SUBMINOR   0
>>> PETSC_VERSION_PATCH      6
>>> PETSC_VERSION_DATE       "Mar, 25, 2010"
>>> PETSC_VERSION_PATCH_DATE "unknown"
>>> PETSC_VERSION_HG         "unknown"
>>> PETSC_VERSION_DATE_HG    "unknown"
>>> PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \
>>> -----------------------------------------
>>> Using configure Options: --with-cc=gcc --with-fc=gfortran
>>> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
>>> --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/
>>> --with-thrust-dir=/usr/local/cuda/include/thrust
>>> Using configuration flags:
>>> #define INCLUDED_PETSCCONF_H
>>> #define IS_COLORING_MAX 65535
>>> #define STDC_HEADERS 1
>>> #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT
>>> #define PETSC_UINTPTR_T uintptr_t
>>> #define PETSC_HAVE_PTHREAD 1
>>> #define PETSC_STATIC_INLINE static inline
>>> #define PETSC_REPLACE_DIR_SEPARATOR '\\'
>>> #define PETSC_RESTRICT  __restrict__
>>> #define PETSC_HAVE_MPI 1
>>> #define PETSC_USE_SINGLE_LIBRARY 1
>>> #define PETSC_USE_SOCKET_VIEWER 1
>>> #define PETSC_HAVE_THRUST 1
>>> #define PETSC_LIB_DIR
>>> "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib"
>>> #define PETSC_HAVE_FORTRAN 1
>>> #define PETSC_HAVE_SOWING 1
>>> #define PETSC_SLSUFFIX ""
>>> #define PETSC_FUNCTION_NAME_CXX __func__
>>> #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1
>>> #define PETSC_UNUSED  
>>> #define PETSC_HAVE_CUDA 1
>>> #define PETSC_FUNCTION_NAME_C __func__
>>> #define PETSC_HAVE_C2HTML 1
>>> #define PETSC_HAVE_VALGRIND 1
>>> #define PETSC_HAVE_BUILTIN_EXPECT 1
>>> #define PETSC_DIR_SEPARATOR '/'
>>> #define PETSC_PATH_SEPARATOR ':'
>>> #define PETSC_HAVE_X11 1
>>> #define PETSC_HAVE_CUSP 1
>>> #define PETSC_Prefetch(a,b,c)  
>>> #define PETSC_HAVE_BLASLAPACK 1
>>> #define PETSC_HAVE_STRING_H 1
>>> #define PETSC_HAVE_SYS_TYPES_H 1
>>> #define PETSC_HAVE_ENDIAN_H 1
>>> #define PETSC_HAVE_SYS_PROCFS_H 1
>>> #define PETSC_HAVE_DLFCN_H 1
>>> #define PETSC_HAVE_STDINT_H 1
>>> #define PETSC_HAVE_LINUX_KERNEL_H 1
>>> #define PETSC_HAVE_TIME_H 1
>>> #define PETSC_HAVE_MATH_H 1
>>> #define PETSC_HAVE_STDLIB_H 1
>>> #define PETSC_HAVE_SYS_PARAM_H 1
>>> #define PETSC_HAVE_SYS_SOCKET_H 1
>>> #define PETSC_HAVE_UNISTD_H 1
>>> #define PETSC_HAVE_SYS_WAIT_H 1
>>> #define PETSC_HAVE_LIMITS_H 1
>>> #define PETSC_HAVE_SYS_UTSNAME_H 1
>>> #define PETSC_HAVE_NETINET_IN_H 1
>>> #define PETSC_HAVE_FENV_H 1
>>> #define PETSC_HAVE_FLOAT_H 1
>>> #define PETSC_HAVE_SEARCH_H 1
>>> #define PETSC_HAVE_SYS_SYSINFO_H 1
>>> #define PETSC_HAVE_SYS_RESOURCE_H 1
>>> #define PETSC_HAVE_SYS_TIMES_H 1
>>> #define PETSC_HAVE_NETDB_H 1
>>> #define PETSC_HAVE_MALLOC_H 1
>>> #define PETSC_HAVE_PWD_H 1
>>> #define PETSC_HAVE_FCNTL_H 1
>>> #define PETSC_HAVE_STRINGS_H 1
>>> #define PETSC_HAVE_MEMORY_H 1
>>> #define PETSC_TIME_WITH_SYS_TIME 1
>>> #define PETSC_HAVE_SYS_TIME_H 1
>>> #define PETSC_USING_F90 1
>>> #define PETSC_HAVE_RTLD_NOW 1
>>> #define PETSC_HAVE_RTLD_LOCAL 1
>>> #define PETSC_HAVE_RTLD_LAZY 1
>>> #define PETSC_C_STATIC_INLINE static inline
>>> #define PETSC_HAVE_FORTRAN_UNDERSCORE 1
>>> #define PETSC_HAVE_CXX_NAMESPACE 1
>>> #define PETSC_HAVE_RTLD_GLOBAL 1
>>> #define PETSC_C_RESTRICT  __restrict__
>>> #define PETSC_CXX_RESTRICT  __restrict__
>>> #define PETSC_CXX_STATIC_INLINE static inline
>>> #define PETSC_HAVE_LIBCUBLAS 1
>>> #define PETSC_HAVE_LIBCUDART 1
>>> #define PETSC_HAVE_LIBDL 1
>>> #define PETSC_HAVE_LIBFBLAS 1
>>> #define PETSC_HAVE_LIBFLAPACK 1
>>> #define PETSC_HAVE_ERF 1
>>> #define PETSC_HAVE_LIBCUFFT 1
>>> #define PETSC_HAVE_LIBRT 1
>>> #define PETSC_ARCH "arch-linux-gnu-c-debug"
>>> #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100"
>>> #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6"
>>> #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e"
>>> #define PETSC_DIR "/home/kuba/External/petsc-dev"
>>> #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600"
>>> #define HAVE_GZIP 1
>>> #define PETSC_CLANGUAGE_C 1
>>> #define PETSC_USE_EXTERN_CXX  
>>> #define PETSC_USE_ERRORCHECKING 1
>>> #define PETSC_MISSING_DREAL 1
>>> #define PETSC_SIZEOF_MPI_COMM 4
>>> #define PETSC_BITS_PER_BYTE 8
>>> #define PETSC_SIZEOF_MPI_FINT 4
>>> #define PETSC_SIZEOF_VOID_P 4
>>> #define PETSC_RETSIGTYPE void
>>> #define PETSC_HAVE_CXX_COMPLEX 1
>>> #define PETSC_SIZEOF_LONG 4
>>> #define PETSC_USE_FORTRANKIND 1
>>> #define PETSC_SIZEOF_SIZE_T 4
>>> #define PETSC_SIZEOF_CHAR 1
>>> #define PETSC_SIZEOF_DOUBLE 8
>>> #define PETSC_SIZEOF_FLOAT 4
>>> #define PETSC_HAVE_C99_COMPLEX 1
>>> #define PETSC_SIZEOF_INT 4
>>> #define PETSC_SIZEOF_LONG_LONG 8
>>> #define PETSC_SIZEOF_SHORT 2
>>> #define PETSC_HAVE_STRCASECMP 1
>>> #define PETSC_HAVE_POPEN 1
>>> #define PETSC_HAVE_SIGSET 1
>>> #define PETSC_HAVE_GETWD 1
>>> #define PETSC_HAVE_VSNPRINTF 1
>>> #define PETSC_HAVE_TIMES 1
>>> #define PETSC_HAVE_DLSYM 1
>>> #define PETSC_HAVE_SNPRINTF 1
>>> #define PETSC_HAVE_GETPWUID 1
>>> #define PETSC_HAVE_GETHOSTBYNAME 1
>>> #define PETSC_HAVE_SLEEP 1
>>> #define PETSC_HAVE_DLERROR 1
>>> #define PETSC_HAVE_FORK 1
>>> #define PETSC_HAVE_RAND 1
>>> #define PETSC_HAVE_GETTIMEOFDAY 1
>>> #define PETSC_HAVE_DLCLOSE 1
>>> #define PETSC_HAVE_UNAME 1
>>> #define PETSC_HAVE_GETHOSTNAME 1
>>> #define PETSC_HAVE_MKSTEMP 1
>>> #define PETSC_HAVE_SIGACTION 1
>>> #define PETSC_HAVE_DRAND48 1
>>> #define PETSC_HAVE_NANOSLEEP 1
>>> #define PETSC_HAVE_VA_COPY 1
>>> #define PETSC_HAVE_CLOCK 1
>>> #define PETSC_HAVE_ACCESS 1
>>> #define PETSC_HAVE_SIGNAL 1
>>> #define PETSC_HAVE_USLEEP 1
>>> #define PETSC_HAVE_GETRUSAGE 1
>>> #define PETSC_HAVE_VFPRINTF 1
>>> #define PETSC_HAVE_MEMALIGN 1
>>> #define PETSC_HAVE_GETDOMAINNAME 1
>>> #define PETSC_HAVE_TIME 1
>>> #define PETSC_HAVE_LSEEK 1
>>> #define PETSC_HAVE_SOCKET 1
>>> #define PETSC_HAVE_SYSINFO 1
>>> #define PETSC_HAVE_READLINK 1
>>> #define PETSC_HAVE_REALPATH 1
>>> #define PETSC_HAVE_DLOPEN 1
>>> #define PETSC_HAVE_MEMMOVE 1
>>> #define PETSC_HAVE__GFORTRAN_IARGC 1
>>> #define PETSC_SIGNAL_CAST  
>>> #define PETSC_HAVE_GETCWD 1
>>> #define PETSC_HAVE_VPRINTF 1
>>> #define PETSC_HAVE_BZERO 1
>>> #define PETSC_HAVE_GETPAGESIZE 1
>>> #define PETSC_LEVEL1_DCACHE_LINESIZE 64
>>> #define PETSC_LEVEL1_DCACHE_SIZE 32768
>>> #define PETSC_LEVEL1_DCACHE_ASSOC 8
>>> #define PETSC_USE_PROC_FOR_SIZE 1
>>> #define PETSC_HAVE_DYNAMIC_LIBRARIES 1
>>> #define PETSC_HAVE_SHARED_LIBRARIES 1
>>> #define PETSC_MEMALIGN 16
>>> #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1
>>> #define PETSC_HAVE_GFORTRAN_IARGC 1
>>> #define PETSC_HAVE_ISINF 1
>>> #define PETSC_HAVE_ISNAN 1
>>> #define PETSC_HAVE_MPI_COMM_C2F 1
>>> #define PETSC_HAVE_MPI_LONG_DOUBLE 1
>>> #define PETSC_HAVE_MPI_COMM_F2C 1
>>> #define PETSC_HAVE_MPI_FINT 1
>>> #define PETSC_HAVE_MPI_F90MODULE 1
>>> #define PETSC_HAVE_MPI_FINALIZED 1
>>> #define PETSC_HAVE_MPI_COMM_SPAWN 1
>>> #define PETSC_HAVE_MPI_WIN_CREATE 1
>>> #define PETSC_HAVE_MPIIO 1
>>> #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1
>>> #define PETSC_HAVE_MPI_ALLTOALLW 1
>>> #define PETSC_HAVE_MPI_IN_PLACE 1
>>> #define PETSC_USE_INFO 1
>>> #define PETSC_PETSC_USE_BACKWARD_LOOP 1
>>> #define PETSC_Alignx(a,b)   
>>> #define PETSC_USE_DEBUG 1
>>> #define PETSC_USE_LOG 1
>>> #define PETSC_IS_COLOR_VALUE_TYPE short
>>> #define PETSC_USE_CTABLE 1
>>> #define PETSC_USE_GDB_DEBUGGER 1
>>> #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" {
>>> #define PETSC_CUDA_EXTERN_C_END }
>>> #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1
>>> #define PETSC_BLASLAPACK_UNDERSCORE 1
>>> -----------------------------------------
>>> Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include
>>> -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
>>> -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
>>> -I/usr/local/cuda/include/thrust/
>>> Using C/C++
>>> compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
>>> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3   
>>> Using Fortran include/module paths:
>>> -I/home/kuba/External/petsc-dev/include
>>> -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include
>>> -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/
>>> -I/usr/local/cuda/include/thrust/
>>> Using Fortran
>>> compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g   
>>> -----------------------------------------
>>> Using C/C++
>>> linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc
>>> Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
>>> -Wno-unknown-pragmas -g3
>>> Using Fortran
>>> linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90
>>> Using Fortran flags: -Wall -Wno-unused-variable -g
>>> -----------------------------------------
>>> Using libraries:
>>> -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib  -lpetsc
>>> -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft
>>> -lcublas -lcudart
>>> -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
>>> -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5
>>> -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread
>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
>>> -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
>>> ------------------------------------------
>>> Using
>>> mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec
>>> ==========================================
>>> /bin/rm -f
>>> -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.*
>>> /bin/rm -f
>>> -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod
>>> BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES
>>> =========================================
>>> libfast in: /home/kuba/External/petsc-dev/src
>>> libfast in: /home/kuba/External/petsc-dev/src/inline
>>> libfast in: /home/kuba/External/petsc-dev/src/sys
>>> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer
>>> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls
>>> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket
>>> In file included from /usr/local/cuda/include/cusp/detail/config.h:24,
>>>                 from /usr/local/cuda/include/cusp/memory.h:20,
>>> 
>>> from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
>>>                 from send.c:3:
>>> /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?,
>>> ?;?, ?asm? or ?__attribute__? before ?thrust?
>>> In file included from /usr/local/cuda/include/cusp/memory.h:22,
>>> 
>>> from /home/kuba/External/petsc-dev/include/petscsys.h:1671,
>>>                 from send.c:3:
>>> /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal
>>> error: iterator: No such file or directory
>>> compilation terminated.
>>> 
>>> 
>>> 
>>> Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze:
>>>> On Thu, Dec 9, 2010 at 20:44, Jakub Pola <jakub.pola at gmail.com> wrote:
>>>>        Petsc Release Version 3.1.0
>>>> 
>>>> You need petsc-dev for this.
>>> 
>>> 
>>> 
> 
> 


From Pierre.Moinier at baesystems.com  Fri Dec 10 04:36:10 2010
From: Pierre.Moinier at baesystems.com (Moinier, Pierre (UK))
Date: Fri, 10 Dec 2010 10:36:10 -0000
Subject: [petsc-users] GPU-enabled PETSc
In-Reply-To: <1291938723.2227.37.camel@desktop>
References: <1291923862.2227.14.camel@desktop><AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com><1291936065.2227.31.camel@desktop><AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
	<1291938723.2227.37.camel@desktop>
Message-ID: <32845768EC63B04EB132BC2C4351B226C6B397@GLKMS2114.GREENLNK.NET>

Hi,

Can any one tell me where to find a documentation that gives details on
the GPU-enabled PETSc version? I looked in the archive mailing list as
well as the petsc-dev doc folder, but did not find anything. What I am
looking for are how the implementation is done, what is currently done
and some examples...

Regards,

	-Pierre.


********************************************************************
This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.
********************************************************************


From jakub.pola at gmail.com  Fri Dec 10 05:33:37 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Fri, 10 Dec 2010 12:33:37 +0100
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
	<AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
Message-ID: <1291980817.2227.46.camel@desktop>

After typing --with-cusp=1 and --with-thrust=1 library was compiled
successfully but when I want to make tests i got following error:
What could be the reason of that?

I have GTX 480 with drivers ver. 260
Ubuntu 10.10 32bit system with 2GB ram.
Core duo processor 2.2GHz.


Test.log:

Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
process
See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the
function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188
src/vec/vec/impls/seq/seqcuda/veccuda.cu
[0]PETSC ERROR: [0] VecGetArray line 226
src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h
[0]PETSC ERROR: [0] VecCreateGhostWithArray line 581
src/vec/vec/impls/mpi/pbvec.c
[0]PETSC ERROR: [0] VecCreateGhost line 661
src/vec/vec/impls/mpi/pbvec.c
[0]PETSC ERROR: [0] MatFDColoringCreate_SeqAIJ line 21
src/mat/impls/aij/seq/fdaij.c
[0]PETSC ERROR: [0] MatFDColoringCreate line 376
src/mat/matfd/fdmatrix.c
[0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c
[0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933
src/snes/utils/damgsnes.c
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development HG revision:
488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
2010 +0100
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10
08:36:02 2010
[0]PETSC ERROR: Libraries linked
from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
[0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
[0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no --with-cusp=1 --with-thrust=1
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
processes
See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html
[0]PETSC ERROR: [1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
[1]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrindor
see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the
function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188
src/vec/vec/impls/seq/seqcuda/veccuda.cu
[0]PETSC ERROR: [0] VecGetArray line 226
src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h
[0]PETSC ERROR: [0] VecCreateGhostWithArray line 581
src/vec/vec/impls/mpi/pbvec.c
[0]PETSC ERROR: [0] VecCreateGhost line 661
src/vec/vec/impls/mpi/pbvec.c
[0]PETSC ERROR: [0] MatFDColoringCreate_MPIAIJ line 24
src/mat/impls/aij/mpi/fdmpiaij.c
[0]PETSC ERROR: [0] MatFDColoringCreate line 376
src/mat/matfd/fdmatrix.c
[0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c
[0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933
src/snes/utils/damgsnes.c
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development HG revision:
488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
2010 +0100
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10
08:36:03 2010
[0]PETSC ERROR: Libraries linked
from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
[0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
[0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no --with-cusp=1 --with-thrust=1
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
[1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) -
process 0
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
Note: The EXACT line numbers in the stack are not available,
[1]PETSC ERROR:       INSTEAD the line number of the start of the
function
[1]PETSC ERROR:       is given.
[1]PETSC ERROR: [1] VecCUDACopyFromGPU line 188
src/vec/vec/impls/seq/seqcuda/veccuda.cu
[1]PETSC ERROR: [1] VecGetArray line 226
src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h
[1]PETSC ERROR: [1] VecCreateGhostWithArray line 581
src/vec/vec/impls/mpi/pbvec.c
[1]PETSC ERROR: [1] VecCreateGhost line 661
src/vec/vec/impls/mpi/pbvec.c
[1]PETSC ERROR: [1] MatFDColoringCreate_MPIAIJ line 24
src/mat/impls/aij/mpi/fdmpiaij.c
[1]PETSC ERROR: [1] MatFDColoringCreate line 376
src/mat/matfd/fdmatrix.c
[1]PETSC ERROR: [1] DMMGSetSNES line 562 src/snes/utils/damgsnes.c
[1]PETSC ERROR: [1] DMMGSetSNESLocal_Private line 933
src/snes/utils/damgsnes.c
[1]PETSC ERROR: --------------------- Error Message
------------------------------------
[1]PETSC ERROR: Signal received!
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Petsc Development HG revision:
488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
2010 +0100
[1]PETSC ERROR: See docs/changes/index.html for recent updates.
[1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[1]PETSC ERROR: See docs/index.html for manual pages.
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10
08:36:03 2010
[1]PETSC ERROR: Libraries linked
from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
[1]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
[1]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no --with-cusp=1 --with-thrust=1
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Error running Fortran example src/snes/examples/tutorials/ex5f with 1
MPI process
See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the
function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188
src/vec/vec/impls/seq/seqcuda/veccuda.cu
[0]PETSC ERROR: [0] VecGetArray line 226
src/vec/vec/interface/ftn-custom//home/kuba/External/petsc-dev/include/private/vecimpl.h
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development HG revision:
488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
2010 +0100
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./ex5f on a arch-linu named desktop by kuba Fri Dec 10
08:36:09 2010
[0]PETSC ERROR: Libraries linked
from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
[0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
[0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
--download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
--with-debug=no --with-cusp=1 --with-thrust=1
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Completed test examples

Dnia 2010-12-10, pi? o godzinie 00:38 +0100, Jed Brown pisze:
> On Fri, Dec 10, 2010 at 00:07, Jakub Pola <jakub.pola at gmail.com>
> wrote:
>         I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp
>         0.2.0
> 
> As I said before, you need petsc-dev for this, CUDA support is not in
> 3.1.
> 
> 
> http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining


From bsmith at mcs.anl.gov  Fri Dec 10 08:11:29 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 10 Dec 2010 08:11:29 -0600
Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda!
In-Reply-To: <1291980817.2227.46.camel@desktop>
References: <1291923862.2227.14.camel@desktop>
	<AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com>
	<1291936065.2227.31.camel@desktop>
	<AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
	<1291980817.2227.46.camel@desktop>
Message-ID: <5E812C03-40D1-4FD5-A43C-AD405B3746E8@mcs.anl.gov>


  Just discovered this problem ourselves with 32 bit compiles. You likely need to pass into ./configure the flags --with-cc="gcc -malign-double" --cxx="g++ -malign-double".  Nvcc secretly uses that option of its compiles so we need to use it for all the compilers.

   Barry

On Dec 10, 2010, at 5:33 AM, Jakub Pola wrote:

> After typing --with-cusp=1 and --with-thrust=1 library was compiled
> successfully but when I want to make tests i got following error:
> What could be the reason of that?
> 
> I have GTX 480 with drivers ver. 260
> Ubuntu 10.10 32bit system with 2GB ram.
> Core duo processor 2.2GHz.
> 
> 
> Test.log:
> 
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
> process
> See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188
> src/vec/vec/impls/seq/seqcuda/veccuda.cu
> [0]PETSC ERROR: [0] VecGetArray line 226
> src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h
> [0]PETSC ERROR: [0] VecCreateGhostWithArray line 581
> src/vec/vec/impls/mpi/pbvec.c
> [0]PETSC ERROR: [0] VecCreateGhost line 661
> src/vec/vec/impls/mpi/pbvec.c
> [0]PETSC ERROR: [0] MatFDColoringCreate_SeqAIJ line 21
> src/mat/impls/aij/seq/fdaij.c
> [0]PETSC ERROR: [0] MatFDColoringCreate line 376
> src/mat/matfd/fdmatrix.c
> [0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c
> [0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933
> src/snes/utils/damgsnes.c
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development HG revision:
> 488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
> 2010 +0100
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10
> 08:36:02 2010
> [0]PETSC ERROR: Libraries linked
> from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
> [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> --with-debug=no --with-cusp=1 --with-thrust=1
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
> processes
> See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html
> [0]PETSC ERROR: [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrindor
> see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [1]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188
> src/vec/vec/impls/seq/seqcuda/veccuda.cu
> [0]PETSC ERROR: [0] VecGetArray line 226
> src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h
> [0]PETSC ERROR: [0] VecCreateGhostWithArray line 581
> src/vec/vec/impls/mpi/pbvec.c
> [0]PETSC ERROR: [0] VecCreateGhost line 661
> src/vec/vec/impls/mpi/pbvec.c
> [0]PETSC ERROR: [0] MatFDColoringCreate_MPIAIJ line 24
> src/mat/impls/aij/mpi/fdmpiaij.c
> [0]PETSC ERROR: [0] MatFDColoringCreate line 376
> src/mat/matfd/fdmatrix.c
> [0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c
> [0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933
> src/snes/utils/damgsnes.c
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development HG revision:
> 488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
> 2010 +0100
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10
> 08:36:03 2010
> [0]PETSC ERROR: Libraries linked
> from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
> [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> --with-debug=no --with-cusp=1 --with-thrust=1
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> [1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) -
> process 0
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> Note: The EXACT line numbers in the stack are not available,
> [1]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> [1]PETSC ERROR:       is given.
> [1]PETSC ERROR: [1] VecCUDACopyFromGPU line 188
> src/vec/vec/impls/seq/seqcuda/veccuda.cu
> [1]PETSC ERROR: [1] VecGetArray line 226
> src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h
> [1]PETSC ERROR: [1] VecCreateGhostWithArray line 581
> src/vec/vec/impls/mpi/pbvec.c
> [1]PETSC ERROR: [1] VecCreateGhost line 661
> src/vec/vec/impls/mpi/pbvec.c
> [1]PETSC ERROR: [1] MatFDColoringCreate_MPIAIJ line 24
> src/mat/impls/aij/mpi/fdmpiaij.c
> [1]PETSC ERROR: [1] MatFDColoringCreate line 376
> src/mat/matfd/fdmatrix.c
> [1]PETSC ERROR: [1] DMMGSetSNES line 562 src/snes/utils/damgsnes.c
> [1]PETSC ERROR: [1] DMMGSetSNESLocal_Private line 933
> src/snes/utils/damgsnes.c
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: Signal received!
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Development HG revision:
> 488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
> 2010 +0100
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10
> 08:36:03 2010
> [1]PETSC ERROR: Libraries linked
> from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
> [1]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
> [1]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> --with-debug=no --with-cusp=1 --with-thrust=1
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
> [cli_1]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> Error running Fortran example src/snes/examples/tutorials/ex5f with 1
> MPI process
> See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188
> src/vec/vec/impls/seq/seqcuda/veccuda.cu
> [0]PETSC ERROR: [0] VecGetArray line 226
> src/vec/vec/interface/ftn-custom//home/kuba/External/petsc-dev/include/private/vecimpl.h
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development HG revision:
> 488e1fcaa13db132861c12416293551e6e00b14e  HG Date: Thu Dec 09 20:23:16
> 2010 +0100
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./ex5f on a arch-linu named desktop by kuba Fri Dec 10
> 08:36:09 2010
> [0]PETSC ERROR: Libraries linked
> from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib
> [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1
> --with-debug=no --with-cusp=1 --with-thrust=1
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> Completed test examples
> 
> Dnia 2010-12-10, pi? o godzinie 00:38 +0100, Jed Brown pisze:
>> On Fri, Dec 10, 2010 at 00:07, Jakub Pola <jakub.pola at gmail.com>
>> wrote:
>>        I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp
>>        0.2.0
>> 
>> As I said before, you need petsc-dev for this, CUDA support is not in
>> 3.1.
>> 
>> 
>> http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining
> 
> 


From bsmith at mcs.anl.gov  Fri Dec 10 08:12:42 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 10 Dec 2010 08:12:42 -0600
Subject: [petsc-users] GPU-enabled PETSc
In-Reply-To: <32845768EC63B04EB132BC2C4351B226C6B397@GLKMS2114.GREENLNK.NET>
References: <1291923862.2227.14.camel@desktop><AANLkTi=m+9emCwQnz1Bt+=z-+wYUHNX+nPkgnh57DQKj@mail.gmail.com><1291936065.2227.31.camel@desktop><AANLkTin8sVhq+T_2_v7Dt8Xa3pBPsTxfqKxgpsGrsNng@mail.gmail.com>
	<1291938723.2227.37.camel@desktop>
	<32845768EC63B04EB132BC2C4351B226C6B397@GLKMS2114.GREENLNK.NET>
Message-ID: <D012067C-A774-40D1-BCDD-9C1FD2B77FBC@mcs.anl.gov>


http://www.mcs.anl.gov/petsc/petsc-as/features/gpus.html 

since this is only supported in petsc-dev suggest moving any future discussion of these issues to petsc-dev at mcs.anl.gov

  Barry


On Dec 10, 2010, at 4:36 AM, Moinier, Pierre (UK) wrote:

> Hi,
> 
> Can any one tell me where to find a documentation that gives details on
> the GPU-enabled PETSc version? I looked in the archive mailing list as
> well as the petsc-dev doc folder, but did not find anything. What I am
> looking for are how the implementation is done, what is currently done
> and some examples...
> 
> Regards,
> 
> 	-Pierre.
> 
> 
> ********************************************************************
> This email and any attachments are confidential to the intended
> recipient and may also be privileged. If you are not the intended
> recipient please delete it from your system and notify the sender.
> You should not copy it or use it for any purpose nor disclose or
> distribute its contents to any other person.
> ********************************************************************
> 


From jakub.pola at gmail.com  Fri Dec 10 11:02:10 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Fri, 10 Dec 2010 18:02:10 +0100
Subject: [petsc-users] KSPBICG on GPU
Message-ID: <1292000530.21002.5.camel@desktop>

Hi

Does anyone have a benchmark of KSPBICG solver which gives me an
information about loop time, GFLOPS, memory transfer bandwidth ?

Thanks in advance


From filippo.spiga at disco.unimib.it  Fri Dec 10 12:22:33 2010
From: filippo.spiga at disco.unimib.it (Filippo Spiga)
Date: Fri, 10 Dec 2010 13:22:33 -0500
Subject: [petsc-users] About the "PetscOptionsSetValue" usage
Message-ID: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>

(sorry in advance for the cross-posting)

Dear all,
    I recently decided to use in my program the
routine "PetscOptionsSetValue" to allow to change at runtime the parameter
of KSP/SNES. Of course, my code performs several operations on different KSP
and SNES objects. Every object should have its own set of options (different
tollerancies, different preconditioner) and I want to tune it at runtime. So
"PetscOptionsSetValue" it is great. But I have a question: when I set an
option, this option is still valid for all the program.

Let's assume this situation (it is an example, I use fake options):

PetscOptionsSetValue("A", "1");
PetscOptionsSetValue("B", "2");
PetscOptionsSetValue("C", "3");

ierr = KSPSolve();

PetscOptionsSetValue("A", "10");
PetscOptionsSetValue("B", "20");

ierr = SNESSolve();


This is what happens (if I correcly understood):
- KSP runs with this presets {A=1, B=2, C=3} that are different than
the defaults.
- SNES runs with this presets {A=10, B=20, C=3}.
So SNES runs with C=3 that is different from the default. But I would like
to use the default because C=3 produces wrong errors. How I can easily reset
the options' database?

Thanks in advance,
Cheers

--
Filippo SPIGA, MSc Computer Science

?Nobody will drive us out of Cantor's paradise.?
    -- David Hilbert

*****
Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an
may be privileged or otherwise protected from disclosure. The contents are
not to be disclosed to anyone other than the addressee. Unauthorized
recipients are requested to preserve this confidentiality and to advise the
sender immediately of any error in transmission."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/655678e9/attachment.htm>

From jed at 59A2.org  Fri Dec 10 12:46:36 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 10 Dec 2010 19:46:36 +0100
Subject: [petsc-users] About the "PetscOptionsSetValue" usage
In-Reply-To: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>
References: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>
Message-ID: <AANLkTimhBMCxZODc=Q0YdJYzQb7=AZOB085MD4XjCLJd@mail.gmail.com>

On Fri, Dec 10, 2010 at 19:22, Filippo Spiga
<filippo.spiga at disco.unimib.it>wrote:

> So SNES runs with C=3 that is different from the default. But I would like
> to use the default because C=3 produces wrong errors. How I can easily reset
> the options' database?


You're making this much more complicated than necessary.  Call

KSPSetOptionsPrefix(ksp,"a_");
SNESSetOptionsPrefix(snes,"b_");

KSPSetFromOptions(ksp);
SNESSetFromOptions(snes);

KSPSolve(ksp,...);
SNESSolve(snes,...);


Then you run the program with

  -a_ksp_type gmres -a_pc_type asm -b_snes_monitor -b_ksp_type ibcgs
-b_pc_type bjacobi

or whatever.  You can control all the details of each solver independently.

If you *need* to control the solvers in code (usually only if you have an
adaptive method where *your program* takes *active* control of the solution
process), you should pull out the objects and use the API instead of strings
to get what you need:

SNESGetKSP(snes,&inner_ksp);
KSPSetType(inner_ksp,KSPIBCGS);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/b41a9c40/attachment.htm>

From filippo.spiga at disco.unimib.it  Fri Dec 10 12:54:51 2010
From: filippo.spiga at disco.unimib.it (Filippo Spiga)
Date: Fri, 10 Dec 2010 13:54:51 -0500
Subject: [petsc-users] About the "PetscOptionsSetValue" usage
In-Reply-To: <AANLkTimhBMCxZODc=Q0YdJYzQb7=AZOB085MD4XjCLJd@mail.gmail.com>
References: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>
	<AANLkTimhBMCxZODc=Q0YdJYzQb7=AZOB085MD4XjCLJd@mail.gmail.com>
Message-ID: <AANLkTikhFLt4TjWnWgW=OL9_LXVVbQhXyUtFyiP9JEtG@mail.gmail.com>

Interesting. This

KSPSetOptionsPrefix(ksp,"a_");
PetscOptionsSetValue("a_A", "1");
PetscOptionsSetValue("a_B", "2");
PetscOptionsSetValue("a_C", "3");
ierr = KSPSolve();

SNESSetOptionsPrefix(snes,"b_");
PetscOptionsSetValue("b_A", "10");
PetscOptionsSetValue("b_B", "20");
ierr = SNESSolve();

should work because the preset will be
- KSP : {A=1, B=2, C=3}
- SNES : {A=10, B=20, C=PETSC_DEFAULT}.
exactly as I want.

I know that it is possible to use API to set parameters (KSPSetType,
SNESSetType). But a lot of options of HYPRE or SUPERLU for example have no
API. Instead of mix option from command-line and API I would like to put
everything in a config file, read it and use PetscOptionsSetValue in the
right way. It seems reasonable to me.

Anyway, every suggestion is welcome (-:

Thanks a lot!

--
Filippo SPIGA, MSc Computer Science
~ homepage: http://tinyurl.com/fspiga ~

?Nobody will drive us out of Cantor's paradise.?
    -- David Hilbert

*****
Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an
may be privileged or otherwise protected from disclosure. The contents are
not to be disclosed to anyone other than the addressee. Unauthorized
recipients are requested to preserve this confidentiality and to advise the
sender immediately of any error in transmission."


On Fri, Dec 10, 2010 at 1:46 PM, Jed Brown <jed at 59a2.org> wrote:

> On Fri, Dec 10, 2010 at 19:22, Filippo Spiga <
> filippo.spiga at disco.unimib.it> wrote:
>
>> So SNES runs with C=3 that is different from the default. But I would like
>> to use the default because C=3 produces wrong errors. How I can easily reset
>> the options' database?
>
>
> You're making this much more complicated than necessary.  Call
>
> KSPSetOptionsPrefix(ksp,"a_");
> SNESSetOptionsPrefix(snes,"b_");
>
> KSPSetFromOptions(ksp);
> SNESSetFromOptions(snes);
>
> KSPSolve(ksp,...);
> SNESSolve(snes,...);
>
>
> Then you run the program with
>
>   -a_ksp_type gmres -a_pc_type asm -b_snes_monitor -b_ksp_type ibcgs
> -b_pc_type bjacobi
>
> or whatever.  You can control all the details of each solver independently.
>
> If you *need* to control the solvers in code (usually only if you have an
> adaptive method where *your program* takes *active* control of the solution
> process), you should pull out the objects and use the API instead of strings
> to get what you need:
>
> SNESGetKSP(snes,&inner_ksp);
> KSPSetType(inner_ksp,KSPIBCGS);
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/6c9c2389/attachment-0001.htm>

From jed at 59A2.org  Fri Dec 10 13:09:33 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 10 Dec 2010 20:09:33 +0100
Subject: [petsc-users] About the "PetscOptionsSetValue" usage
In-Reply-To: <AANLkTikhFLt4TjWnWgW=OL9_LXVVbQhXyUtFyiP9JEtG@mail.gmail.com>
References: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>
	<AANLkTimhBMCxZODc=Q0YdJYzQb7=AZOB085MD4XjCLJd@mail.gmail.com>
	<AANLkTikhFLt4TjWnWgW=OL9_LXVVbQhXyUtFyiP9JEtG@mail.gmail.com>
Message-ID: <AANLkTik3C2au_XaYuVUq-XkSF=r_tCcXFyP2nLd5KjV1@mail.gmail.com>

On Fri, Dec 10, 2010 at 19:54, Filippo Spiga
<filippo.spiga at disco.unimib.it>wrote:

> I know that it is possible to use API to set parameters (KSPSetType,
> SNESSetType). But a lot of options of HYPRE or SUPERLU for example have no
> API. Instead of mix option from command-line and API I would like to put
> everything in a config file


You might be interested in the -options_file command line option and
PetscOptionsInsertFile.  Also, any options present in

~/.petscrc
.petscrc (in current directory)
petscrc (in current directory)

get slurped in automatically, as well as the string in the PETSC_OPTIONS
environment variable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/0346b6ee/attachment.htm>

From luke.bloy at gmail.com  Fri Dec 10 15:15:13 2010
From: luke.bloy at gmail.com (Luke Bloy)
Date: Fri, 10 Dec 2010 16:15:13 -0500
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <4D029757.6060708@seas.upenn.edu>
References: <4D029757.6060708@seas.upenn.edu>
Message-ID: <4D029861.8000508@gmail.com>

Hi I'm new to Petsc so excuse me if this question is naive.

I'm trying to solve the following system A x = b for x.  A is a sparse 
square matrix (2000000 by 2000000 with ~45,000,000 nonzero elements)
I'm currently using ex1 as the basis for solving the system and it is 
working quite well.

My problem is that i have a large number (~500,000)  of b vectors that I 
would like to find solutions for. My plan is to call KSPsolve repeatedly 
with each b. However I wonder if there are any solvers or approaches 
that might benefit from the fact that my A matrix does not change. Are 
there any decompositions that might still be sparse that would offer a 
speed up?

Thanks for any suggestions.
Luke

From jed at 59A2.org  Fri Dec 10 15:18:43 2010
From: jed at 59A2.org (Jed Brown)
Date: Fri, 10 Dec 2010 22:18:43 +0100
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <4D029861.8000508@gmail.com>
References: <4D029757.6060708@seas.upenn.edu>
	<4D029861.8000508@gmail.com>
Message-ID: <AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>

On Fri, Dec 10, 2010 at 22:15, Luke Bloy <luke.bloy at gmail.com> wrote:

> My problem is that i have a large number (~500,000)  of b vectors that I
> would like to find solutions for. My plan is to call KSPsolve repeatedly
> with each b. However I wonder if there are any solvers or approaches that
> might benefit from the fact that my A matrix does not change. Are there any
> decompositions that might still be sparse that would offer a speed up?


1. What is the high-level problem you are trying to solve?  There might be a
better way.

2. If you can afford the memory, a direct solve probably makes sense.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/bd11fe4e/attachment.htm>

From lbloy at seas.upenn.edu  Fri Dec 10 15:10:47 2010
From: lbloy at seas.upenn.edu (Luke Bloy)
Date: Fri, 10 Dec 2010 16:10:47 -0500
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
Message-ID: <4D029757.6060708@seas.upenn.edu>

Hi I'm new to Petsc so excuse me if this question is naive.

I'm trying to solve the following system A x = b for x.  A is a sparse 
square matrix (2000000 by 2000000 with ~45,000,000 nonzero elements)
I'm currently using ex1 as the basis for solving the system and it is 
working quite well.

My problem is that i have a large number (~500,000)  of b vectors that I 
would like to find solutions for. My plan is to call KSPsolve repeatedly 
with each b. However I wonder if there are any solvers or approaches 
that might benefit from the fact that my A matrix does not change. Are 
there any decompositions that might still be sparse that would offer a 
speed up?

Thanks for any suggestions.
Luke

From luke.bloy at gmail.com  Fri Dec 10 17:03:31 2010
From: luke.bloy at gmail.com (Luke Bloy)
Date: Fri, 10 Dec 2010 18:03:31 -0500
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>
References: <4D029757.6060708@seas.upenn.edu>	<4D029861.8000508@gmail.com>
	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>
Message-ID: <4D02B1C3.4040409@gmail.com>


Thanks for the response.

On 12/10/2010 04:18 PM, Jed Brown wrote:
> On Fri, Dec 10, 2010 at 22:15, Luke Bloy <luke.bloy at gmail.com 
> <mailto:luke.bloy at gmail.com>> wrote:
>
>     My problem is that i have a large number (~500,000)  of b vectors
>     that I would like to find solutions for. My plan is to call
>     KSPsolve repeatedly with each b. However I wonder if there are any
>     solvers or approaches that might benefit from the fact that my A
>     matrix does not change. Are there any decompositions that might
>     still be sparse that would offer a speed up?
>
>
> 1. What is the high-level problem you are trying to solve?  There 
> might be a better way.
>
I'm solving a diffusion problem. essentially I have 2,000,000 possible 
states for my system to be in. The system evolves based on a markov 
matrix M, which describes the probability the system moves from one 
state to another. This matrix is extremely sparse on the < 100,000,000 
nonzero elements. The problem is to pump mass/energy into the system at 
certain states. What I'm interested in is the steady state behavior of 
the system.

basically the dynamics can be summarized as

d_{t+1} = M d_{t} + d_i

Where d_t is the state vector at time t and d_i shows the states I am 
pumping energy into. I want to find d_t as t goes to infinity.

My current approach is to solve the following system.

(I-M) d = d_i

I'm certainly open to any suggestions you might have.

> 2. If you can afford the memory, a direct solve probably makes sense.

My understanding is the inverses would generally be dense. I certainly 
don't have any memory to hold a 2 million by 2 million dense matrix, I 
have about 40G to play with. So perhaps a decomposition might work? 
Which might you suggest?

Thanks
Luke

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/8af71718/attachment.htm>

From knepley at gmail.com  Fri Dec 10 17:22:58 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 10 Dec 2010 23:22:58 +0000
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <4D02B1C3.4040409@gmail.com>
References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com>
	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>
	<4D02B1C3.4040409@gmail.com>
Message-ID: <AANLkTikLGoT9WHzaEdk03Xuo02zcaz8H6Q3PLaxVmY83@mail.gmail.com>

On Fri, Dec 10, 2010 at 11:03 PM, Luke Bloy <luke.bloy at gmail.com> wrote:

>
> Thanks for the response.
>
> On 12/10/2010 04:18 PM, Jed Brown wrote:
>
> On Fri, Dec 10, 2010 at 22:15, Luke Bloy <luke.bloy at gmail.com> wrote:
>
>> My problem is that i have a large number (~500,000)  of b vectors that I
>> would like to find solutions for. My plan is to call KSPsolve repeatedly
>> with each b. However I wonder if there are any solvers or approaches that
>> might benefit from the fact that my A matrix does not change. Are there any
>> decompositions that might still be sparse that would offer a speed up?
>
>
> 1. What is the high-level problem you are trying to solve?  There might be
> a better way.
>
>  I'm solving a diffusion problem. essentially I have 2,000,000 possible
> states for my system to be in. The system evolves based on a markov matrix
> M, which describes the probability the system moves from one state to
> another. This matrix is extremely sparse on the < 100,000,000 nonzero
> elements. The problem is to pump mass/energy into the system at certain
> states. What I'm interested in is the steady state behavior of the system.
>
> basically the dynamics can be summarized as
>
> d_{t+1} = M d_{t} + d_i
>
> Where d_t is the state vector at time t and d_i shows the states I am
> pumping energy into. I want to find d_t as t goes to infinity.
>
> My current approach is to solve the following system.
>
> (I-M) d = d_i
>
> I'm certainly open to any suggestions you might have.
>
>  2. If you can afford the memory, a direct solve probably makes sense.
>
>
> My understanding is the inverses would generally be dense. I certainly
> don't have any memory to hold a 2 million by 2 million dense matrix, I have
> about 40G to play with. So perhaps a decomposition might work? Which might
> you suggest?
>

Try -pc_type lu -pc_mat_factor_package <mumps, superlu_dist> once you have
reconfigured using

  --download-superlu_dist --download-mumps

They are sparse LU factorization packages that might work.

   Matt


> Thanks
> Luke
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/310b1cee/attachment.htm>

From jed at 59A2.org  Fri Dec 10 17:30:39 2010
From: jed at 59A2.org (Jed Brown)
Date: Sat, 11 Dec 2010 00:30:39 +0100
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <4D02B1C3.4040409@gmail.com>
References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com>
	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>
	<4D02B1C3.4040409@gmail.com>
Message-ID: <AANLkTikxBg2adVCbSnWsLqBwJYyhYE3Y=+s3kyFBUWCY@mail.gmail.com>

On Sat, Dec 11, 2010 at 00:03, Luke Bloy <luke.bloy at gmail.com> wrote:

> I'm solving a diffusion problem. essentially I have 2,000,000 possible
> states for my system to be in. The system evolves based on a markov matrix
> M, which describes the probability the system moves from one state to
> another. This matrix is extremely sparse on the < 100,000,000 nonzero
> elements. The problem is to pump mass/energy into the system at certain
> states. What I'm interested in is the steady state behavior of the system.
>
> basically the dynamics can be summarized as
>
> d_{t+1} = M d_{t} + d_i
>
> Where d_t is the state vector at time t and d_i shows the states I am
> pumping energy into. I want to find d_t as t goes to infinity.
>
> My current approach is to solve the following system.
>
> (I-M) d = d_i
>

So you want to do this for some 500,000 d_i?  What problem are you really
trying to solve?  Is it really to just brute-force compute states for all
these inputs?  What are you doing with the resulting 500k states (all 8
terabytes of it)?  Are you, for example, looking for some d_i that would
change the steady state d in a certain way?

2. If you can afford the memory, a direct solve probably makes sense.
>
>
> My understanding is the inverses would generally be dense. I certainly
> don't have any memory to hold a 2 million by 2 million dense matrix, I have
> about 40G to play with. So perhaps a decomposition might work? Which might
> you suggest?
>

While inverses are almost always dense, sparse factorization is far from
dense.  For PDE problems factored in an optimal ordering, the memory
asymptotics are n*log n in 2D and n^{4/3} in 3D.  The time asymptotics are
n^{3/2} and n^2 respectively.  Compare to n^2 memory, n^3 time for dense.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/d3207971/attachment-0001.htm>

From mmnasr at gmail.com  Fri Dec 10 17:46:40 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Fri, 10 Dec 2010 15:46:40 -0800
Subject: [petsc-users] global index distributed arrays
Message-ID: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>

Hi guys,

I was wondering if there is an easy way of accessing the global index of a
node which is not within the local and ghost node regions on every processor
for DA.
To be more more specific, I am trying to setup a matrix based on a
three-dimensional DA. (Star stencil, width=1).
For some special nodes, I need to insert nonzero values which do not fit in
the local plus the ghost regions of the DA.
I know that I can not use MatSetValuesStencil anymore, but I still can use
MatSetValues. But I need to know the global index of those nodes.
I tried to use DAGetGlobalIndices(), but that would again return the global
indices of the local plus ghost nodes on current processor.
I know that I could use some MPI commands to pass those indices among
processors, but I was wondering if there a clean and neat way with which
each processor can have access to the global index for any given 3-d index.

Thank,
Mohamad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/4e73476b/attachment.htm>

From knepley at gmail.com  Fri Dec 10 18:40:43 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 11 Dec 2010 00:40:43 +0000
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
Message-ID: <AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>

On Fri, Dec 10, 2010 at 11:46 PM, Mohamad M. Nasr-Azadani
<mmnasr at gmail.com>wrote:

> Hi guys,
>
> I was wondering if there is an easy way of accessing the global index of a
> node which is not within the local and ghost node regions on every processor
> for DA.
> To be more more specific, I am trying to setup a matrix based on a
> three-dimensional DA. (Star stencil, width=1).
> For some special nodes, I need to insert nonzero values which do not fit in
> the local plus the ghost regions of the DA.
> I know that I can not use MatSetValuesStencil anymore, but I still can use
> MatSetValues. But I need to know the global index of those nodes.
> I tried to use DAGetGlobalIndices(), but that would again return the global
> indices of the local plus ghost nodes on current processor.
> I know that I could use some MPI commands to pass those indices among
> processors, but I was wondering if there a clean and neat way with which
> each processor can have access to the global index for any given 3-d index.
>

((k*N + j)*M + i)*C + c

   Matt


> Thank,
> Mohamad
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/7392f742/attachment.htm>

From jed at 59A2.org  Fri Dec 10 18:47:36 2010
From: jed at 59A2.org (Jed Brown)
Date: Sat, 11 Dec 2010 01:47:36 +0100
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
Message-ID: <AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>

On Sat, Dec 11, 2010 at 01:40, Matthew Knepley <knepley at gmail.com> wrote:

> I know that I could use some MPI commands to pass those indices among
>> processors, but I was wondering if there a clean and neat way with which
>> each processor can have access to the global index for any given 3-d index.
>>
>
> ((k*N + j)*M + i)*C + c
>

Matt, you have seriously missed the point.

Mohamad, there is not an easy way to calculate it for an arbitrary index, it
necessarily involves a search.  Do you need it for arbitrary indices, or do
you know in advance which global indices are needed?  DAGetAO() can give you
what you are after, but there is usually a way to avoid using that code
since it is not scalable.

Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/2b28563c/attachment.htm>

From knepley at gmail.com  Fri Dec 10 18:53:37 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 11 Dec 2010 00:53:37 +0000
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
Message-ID: <AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>

On Sat, Dec 11, 2010 at 12:47 AM, Jed Brown <jed at 59a2.org> wrote:

> On Sat, Dec 11, 2010 at 01:40, Matthew Knepley <knepley at gmail.com> wrote:
>
>> I know that I could use some MPI commands to pass those indices among
>>> processors, but I was wondering if there a clean and neat way with which
>>> each processor can have access to the global index for any given 3-d index.
>>>
>>
>> ((k*N + j)*M + i)*C + c
>>
>
> Matt, you have seriously missed the point.
>
> Mohamad, there is not an easy way to calculate it for an arbitrary index,
> it necessarily involves a search.  Do you need it for arbitrary indices, or
> do you know in advance which global indices are needed?  DAGetAO() can give
> you what you are after, but there is usually a way to avoid using that code
> since it is not scalable.
>

What are you guys talking about? He is asking ("global index for any given
3-d index") for a map

  (i, j, k) -->  ((k*N + j)*M + i)*C + c

I can't imagine what you are searching for? The process which owns a given
index does not involve a search.

   Matt


> Jed
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/d48cce62/attachment.htm>

From jed at 59A2.org  Fri Dec 10 18:56:30 2010
From: jed at 59A2.org (Jed Brown)
Date: Sat, 11 Dec 2010 01:56:30 +0100
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
Message-ID: <AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>

On Sat, Dec 11, 2010 at 01:53, Matthew Knepley <knepley at gmail.com> wrote:

> What are you guys talking about? He is asking ("global index for any given
> 3-d index") for a map
>
>   (i, j, k) -->  ((k*N + j)*M + i)*C + c
>
> I can't imagine what you are searching for? The process which owns a given
> index does not involve a search.
>

The "global index" is in the "PETSc ordering".  He wants this index for an
arbitrary (i,j,k) which are not in the ghosted patch of the current process.
 You either have to store the full mapping, on search lx,ly,lz to locate the
owner, then compute the index relative to that process.  I don't think that
code exists in PETSc.  It wouldn't be too hard to write, but it's not the
most beautiful thing to do.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/e2ff1013/attachment.htm>

From knepley at gmail.com  Fri Dec 10 19:01:46 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 11 Dec 2010 01:01:46 +0000
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
Message-ID: <AANLkTi=k8R0HXDVbhq4QVTGeL166X7tqcZG-sXLxNM1T@mail.gmail.com>

On Sat, Dec 11, 2010 at 12:56 AM, Jed Brown <jed at 59a2.org> wrote:

> On Sat, Dec 11, 2010 at 01:53, Matthew Knepley <knepley at gmail.com> wrote:
>
>> What are you guys talking about? He is asking ("global index for any given
>> 3-d index") for a map
>>
>>   (i, j, k) -->  ((k*N + j)*M + i)*C + c
>>
>> I can't imagine what you are searching for? The process which owns a given
>> index does not involve a search.
>>
>
> The "global index" is in the "PETSc ordering".  He wants this index for an
> arbitrary (i,j,k) which are not in the ghosted patch of the current process.
>  You either have to store the full mapping, on search lx,ly,lz to locate the
> owner, then compute the index relative to that process.  I don't think that
> code exists in PETSc.  It wouldn't be too hard to write, but it's not the
> most beautiful thing to do.
>

I am not sure why you would need to access regions outside your local piece.
For instance,
consider how we treat boundary conditions. This is access to a fixed index,
but we check on
each process

  if (i == i_bc) { <apply BC> }

Can't you do the same thing with your extra rows?

   Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/9b689383/attachment.htm>

From mmnasr at gmail.com  Fri Dec 10 19:05:21 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Fri, 10 Dec 2010 17:05:21 -0800
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
Message-ID: <AANLkTik6OSeNqqFWGRxkgx0pQG_HwA4iK_=TH0EpVYPu@mail.gmail.com>

On Fri, Dec 10, 2010 at 4:56 PM, Jed Brown <jed at 59a2.org> wrote:

> On Sat, Dec 11, 2010 at 01:53, Matthew Knepley <knepley at gmail.com> wrote:
>
>> What are you guys talking about? He is asking ("global index for any given
>> 3-d index") for a map
>>
>>   (i, j, k) -->  ((k*N + j)*M + i)*C + c
>>
>> I can't imagine what you are searching for? The process which owns a given
>> index does not involve a search.
>>
>
> The "global index" is in the "PETSc ordering".  He wants this index for an
> arbitrary (i,j,k) which are not in the ghosted patch of the current process.
>  You either have to store the full mapping, on search lx,ly,lz to locate the
> owner, then compute the index relative to that process.  I don't think that
> code exists in PETSc.  It wouldn't be too hard to write, but it's not the
> most beautiful thing to do.
>

Mat, that's exactly came to my mind as the first solution. But I was hoping
that I could avoid that. As you said it is not that hard to write that
function, that's why I hoped that PETSc alread has that.
Thanks for your help,
Mohamad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/33e4f3d9/attachment-0001.htm>

From mmnasr at gmail.com  Fri Dec 10 19:07:44 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Fri, 10 Dec 2010 17:07:44 -0800
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTik6OSeNqqFWGRxkgx0pQG_HwA4iK_=TH0EpVYPu@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
	<AANLkTik6OSeNqqFWGRxkgx0pQG_HwA4iK_=TH0EpVYPu@mail.gmail.com>
Message-ID: <AANLkTimii411AvmFB_FJwserVHwGjp3z7viNU-P4FKnO@mail.gmail.com>

Sorry for my last message,
I meant thanks to Jed,
Mohamad


On Fri, Dec 10, 2010 at 5:05 PM, Mohamad M. Nasr-Azadani
<mmnasr at gmail.com>wrote:

>
>
> On Fri, Dec 10, 2010 at 4:56 PM, Jed Brown <jed at 59a2.org> wrote:
>
>> On Sat, Dec 11, 2010 at 01:53, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> What are you guys talking about? He is asking ("global index for any
>>> given 3-d index") for a map
>>>
>>>   (i, j, k) -->  ((k*N + j)*M + i)*C + c
>>>
>>> I can't imagine what you are searching for? The process which owns a
>>> given index does not involve a search.
>>>
>>
>> The "global index" is in the "PETSc ordering".  He wants this index for an
>> arbitrary (i,j,k) which are not in the ghosted patch of the current process.
>>  You either have to store the full mapping, on search lx,ly,lz to locate the
>> owner, then compute the index relative to that process.  I don't think that
>> code exists in PETSc.  It wouldn't be too hard to write, but it's not the
>> most beautiful thing to do.
>>
>
> Mat, that's exactly came to my mind as the first solution. But I was hoping
> that I could avoid that. As you said it is not that hard to write that
> function, that's why I hoped that PETSc alread has that.
> Thanks for your help,
> Mohamad
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/a8b3fe09/attachment.htm>

From mmnasr at gmail.com  Fri Dec 10 19:14:01 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Fri, 10 Dec 2010 17:14:01 -0800
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTi=k8R0HXDVbhq4QVTGeL166X7tqcZG-sXLxNM1T@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
	<AANLkTi=k8R0HXDVbhq4QVTGeL166X7tqcZG-sXLxNM1T@mail.gmail.com>
Message-ID: <AANLkTim3S34VAPOgrdQnjr0tjD83=kgJoHrp6giYEh6f@mail.gmail.com>

I am not sure why you would need to access regions outside your local piece.
For instance,
consider how we treat boundary conditions. This is access to a fixed index,
but we check on
each process

  if (i == i_bc) { <apply BC> }

Can't you do the same thing with your extra rows?

   Matt

Well, my story is a bit complicated.
Now that you asked, I would like to have your opinion too.
So, what I am trying to to is to use STAR stenctil to descretize and solve
Poisson type equations. So far so good, I can take care of the regular nodes
with 3D DA's and width=1. The problems rises for some nodes that are
neighboring solid boundaries.
Those nodes, do not follow Poisson equation anymore and they just obey some
interpolation equations which might need nodes in the BOX stencil of
width=3.
I am restricted by memory requirements, and try to avoid creating my matrix
using a 3D DA BOX_STENCIL and width=3. That would cost a lot and soon I need
to launch simulations with O(10^8) grid points.
So, I thought, I create my matrix using STAR_STENCIL and width=1 and just
manually insert the new nonzeros into the matrix.
Since the number of those nodes is not that many, I thought it would be a
good approach.
Thanks if you think you could guide me here too.
Thanks for your help too,
Mohamad


On Fri, Dec 10, 2010 at 5:01 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Sat, Dec 11, 2010 at 12:56 AM, Jed Brown <jed at 59a2.org> wrote:
>
>> On Sat, Dec 11, 2010 at 01:53, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> What are you guys talking about? He is asking ("global index for any
>>> given 3-d index") for a map
>>>
>>>   (i, j, k) -->  ((k*N + j)*M + i)*C + c
>>>
>>> I can't imagine what you are searching for? The process which owns a
>>> given index does not involve a search.
>>>
>>
>> The "global index" is in the "PETSc ordering".  He wants this index for an
>> arbitrary (i,j,k) which are not in the ghosted patch of the current process.
>>  You either have to store the full mapping, on search lx,ly,lz to locate the
>> owner, then compute the index relative to that process.  I don't think that
>> code exists in PETSc.  It wouldn't be too hard to write, but it's not the
>> most beautiful thing to do.
>>
>
> I am not sure why you would need to access regions outside your local
> piece. For instance,
> consider how we treat boundary conditions. This is access to a fixed index,
> but we check on
> each process
>
>   if (i == i_bc) { <apply BC> }
>
> Can't you do the same thing with your extra rows?
>
>    Matt
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/4dad3c2e/attachment.htm>

From jed at 59A2.org  Fri Dec 10 19:22:46 2010
From: jed at 59A2.org (Jed Brown)
Date: Sat, 11 Dec 2010 02:22:46 +0100
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTim3S34VAPOgrdQnjr0tjD83=kgJoHrp6giYEh6f@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
	<AANLkTi=k8R0HXDVbhq4QVTGeL166X7tqcZG-sXLxNM1T@mail.gmail.com>
	<AANLkTim3S34VAPOgrdQnjr0tjD83=kgJoHrp6giYEh6f@mail.gmail.com>
Message-ID: <AANLkTi=hWy6S4a2Hw4_z48619wVT7wizwLAiZDMT=bWB@mail.gmail.com>

On Sat, Dec 11, 2010 at 02:14, Mohamad M. Nasr-Azadani <mmnasr at gmail.com>wrote:

> Those nodes, do not follow Poisson equation anymore and they just obey some
> interpolation equations which might need nodes in the BOX stencil of
> width=3.


What kind of boundary condition is this?

In any case, you can create an independent DA of the same size as your
original, but with a box stencil of width 3.  Then
DAGetISLocalToGlobalMapping (generalized to DMGetLocalToGlobalMapping in
petsc-dev) will give you access to those global indices.  You will likely
have to adjust preallocation for these extra entries (unless, somehow
strangely, interpolation actually uses no more points, just from different
places).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/303bf2b6/attachment.htm>

From mmnasr at gmail.com  Fri Dec 10 20:03:36 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Fri, 10 Dec 2010 18:03:36 -0800
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTi=hWy6S4a2Hw4_z48619wVT7wizwLAiZDMT=bWB@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
	<AANLkTi=k8R0HXDVbhq4QVTGeL166X7tqcZG-sXLxNM1T@mail.gmail.com>
	<AANLkTim3S34VAPOgrdQnjr0tjD83=kgJoHrp6giYEh6f@mail.gmail.com>
	<AANLkTi=hWy6S4a2Hw4_z48619wVT7wizwLAiZDMT=bWB@mail.gmail.com>
Message-ID: <AANLkTimuURZQxzKdobyT4GCTXLYF8vvWS8HK9t_rky9x@mail.gmail.com>

What kind of boundary condition is this?
It is both Dirichlet and Neumann B.C. It uses trilinear interpolation to
impose the correct boundary condition on the node based on the exact
location of the interface.

In any case, you can create an independent DA of the same size as your
original, but with a box stencil of width 3.  Then
DAGetISLocalToGlobalMapping (generalized to DMGetLocalToGlobalMapping in
petsc-dev) will give you access to those global indices.  You will likely
have to adjust preallocation for these extra entries (unless, somehow
strangely, interpolation actually uses no more points, just from different
places).

That sounds like a good suggestion. I may do this as it seems to be the
reasonable and easy way to go.
But regarding the memory allocation,
for all the regular nodes, for each row, that would be only 7 nonzeroes
(STAR stencil). But for the special nodes (close to the boundaries), they
need 9 nonzeros per row and that would not necessary follow the STAR
stencil.
For example, for the node at (i,j,k) it might (depending on the normal
direction of the solid surface) add the nonzeros at
(i,j,k)
(i+1,j+1,k)
(i+1,j+2,k)
(i+2,j+1,k)
(i+2,j+2,k)
(i+1,j+1,k+1)
(i+1,j+2,k+1)
(i+2,j+1,k+1)
(i+2,j+2,k+1)

Do you think that would add a lot of extra memory allocation?
Thanks and have a good weekend,
Mohamad


On Fri, Dec 10, 2010 at 5:22 PM, Jed Brown <jed at 59a2.org> wrote:

> On Sat, Dec 11, 2010 at 02:14, Mohamad M. Nasr-Azadani <mmnasr at gmail.com>wrote:
>
>> Those nodes, do not follow Poisson equation anymore and they just obey
>> some interpolation equations which might need nodes in the BOX stencil of
>> width=3.
>
>
> What kind of boundary condition is this?
>
> In any case, you can create an independent DA of the same size as your
> original, but with a box stencil of width 3.  Then
> DAGetISLocalToGlobalMapping (generalized to DMGetLocalToGlobalMapping in
> petsc-dev) will give you access to those global indices.  You will likely
> have to adjust preallocation for these extra entries (unless, somehow
> strangely, interpolation actually uses no more points, just from different
> places).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101210/3f5bcbc8/attachment.htm>

From jed at 59A2.org  Fri Dec 10 20:06:30 2010
From: jed at 59A2.org (Jed Brown)
Date: Sat, 11 Dec 2010 03:06:30 +0100
Subject: [petsc-users] global index distributed arrays
In-Reply-To: <AANLkTimuURZQxzKdobyT4GCTXLYF8vvWS8HK9t_rky9x@mail.gmail.com>
References: <AANLkTimOVp2SodxpR-_nHZfgX-g9cRJm56pLYgyXAof0@mail.gmail.com>
	<AANLkTinH6CD2G9gnGpKR+HMjVWGyZA20hbz=c_wjwg8-@mail.gmail.com>
	<AANLkTi=kQ9fLrZaqqSbs5btnbXO20QwXN42TWKw+-m6D@mail.gmail.com>
	<AANLkTim_R=9KJ-XM7hMd5qKJtu8HHgiX17YubHkYrrut@mail.gmail.com>
	<AANLkTik=4hsq5FiOTzKV4T0izoXdcpTRmoLJYaK14uN-@mail.gmail.com>
	<AANLkTi=k8R0HXDVbhq4QVTGeL166X7tqcZG-sXLxNM1T@mail.gmail.com>
	<AANLkTim3S34VAPOgrdQnjr0tjD83=kgJoHrp6giYEh6f@mail.gmail.com>
	<AANLkTi=hWy6S4a2Hw4_z48619wVT7wizwLAiZDMT=bWB@mail.gmail.com>
	<AANLkTimuURZQxzKdobyT4GCTXLYF8vvWS8HK9t_rky9x@mail.gmail.com>
Message-ID: <AANLkTimHS+jXmy_L4kMkSKnBF=P1jCPUNqcFkRL4sUrF@mail.gmail.com>

On Sat, Dec 11, 2010 at 03:03, Mohamad M. Nasr-Azadani <mmnasr at gmail.com>wrote:

> Do you think that would add a lot of extra memory allocation?


It's not "a lot", but the first assembly won't go well because the
preallocation will be too small (unless you update it).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/4743056a/attachment.htm>

From lbloy at seas.upenn.edu  Sat Dec 11 00:21:05 2010
From: lbloy at seas.upenn.edu (Luke Bloy)
Date: Sat, 11 Dec 2010 01:21:05 -0500
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <AANLkTikLGoT9WHzaEdk03Xuo02zcaz8H6Q3PLaxVmY83@mail.gmail.com>
References: <4D029757.6060708@seas.upenn.edu>
	<4D029861.8000508@gmail.com>	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>	<4D02B1C3.4040409@gmail.com>
	<AANLkTikLGoT9WHzaEdk03Xuo02zcaz8H6Q3PLaxVmY83@mail.gmail.com>
Message-ID: <4D031851.4010000@seas.upenn.edu>

Matt thanks for the response.  I'll give those a try. I'm also 
interested in try the Cholesky decomposition is there particular 
external packages that are required to use it?

Thanks again.
Luke

On 12/10/2010 06:22 PM, Matthew Knepley wrote:
> On Fri, Dec 10, 2010 at 11:03 PM, Luke Bloy <luke.bloy at gmail.com 
> <mailto:luke.bloy at gmail.com>> wrote:
>
>
>     Thanks for the response.
>
>     On 12/10/2010 04:18 PM, Jed Brown wrote:
>>     On Fri, Dec 10, 2010 at 22:15, Luke Bloy <luke.bloy at gmail.com
>>     <mailto:luke.bloy at gmail.com>> wrote:
>>
>>         My problem is that i have a large number (~500,000)  of b
>>         vectors that I would like to find solutions for. My plan is
>>         to call KSPsolve repeatedly with each b. However I wonder if
>>         there are any solvers or approaches that might benefit from
>>         the fact that my A matrix does not change. Are there any
>>         decompositions that might still be sparse that would offer a
>>         speed up?
>>
>>
>>     1. What is the high-level problem you are trying to solve?  There
>>     might be a better way.
>>
>     I'm solving a diffusion problem. essentially I have 2,000,000
>     possible states for my system to be in. The system evolves based
>     on a markov matrix M, which describes the probability the system
>     moves from one state to another. This matrix is extremely sparse
>     on the < 100,000,000 nonzero elements. The problem is to pump
>     mass/energy into the system at certain states. What I'm interested
>     in is the steady state behavior of the system.
>
>     basically the dynamics can be summarized as
>
>     d_{t+1} = M d_{t} + d_i
>
>     Where d_t is the state vector at time t and d_i shows the states I
>     am pumping energy into. I want to find d_t as t goes to infinity.
>
>     My current approach is to solve the following system.
>
>     (I-M) d = d_i
>
>     I'm certainly open to any suggestions you might have.
>
>>     2. If you can afford the memory, a direct solve probably makes sense.
>
>     My understanding is the inverses would generally be dense. I
>     certainly don't have any memory to hold a 2 million by 2 million
>     dense matrix, I have about 40G to play with. So perhaps a
>     decomposition might work? Which might you suggest?
>
>
> Try -pc_type lu -pc_mat_factor_package <mumps, superlu_dist> once you 
> have reconfigured using
>
>   --download-superlu_dist --download-mumps
>
> They are sparse LU factorization packages that might work.
>
>    Matt
>
>     Thanks
>     Luke
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/d3351d67/attachment-0001.htm>

From knepley at gmail.com  Sat Dec 11 00:27:39 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 11 Dec 2010 06:27:39 +0000
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <4D031851.4010000@seas.upenn.edu>
References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com>
	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>
	<4D02B1C3.4040409@gmail.com>
	<AANLkTikLGoT9WHzaEdk03Xuo02zcaz8H6Q3PLaxVmY83@mail.gmail.com>
	<4D031851.4010000@seas.upenn.edu>
Message-ID: <AANLkTin5WWs5hq3HBtytHpE-2iTY07Z5nMb5S0QvWRw5@mail.gmail.com>

On Sat, Dec 11, 2010 at 6:21 AM, Luke Bloy <lbloy at seas.upenn.edu> wrote:

>  Matt thanks for the response.  I'll give those a try. I'm also interested
> in try the Cholesky decomposition is there particular external packages
> that are required to use it?
>

Mumps should do Cholesky for a symmetric matrix.

   Matt


> Thanks again.
> Luke
>
> On 12/10/2010 06:22 PM, Matthew Knepley wrote:
>
> On Fri, Dec 10, 2010 at 11:03 PM, Luke Bloy <luke.bloy at gmail.com> wrote:
>
>>
>> Thanks for the response.
>>
>> On 12/10/2010 04:18 PM, Jed Brown wrote:
>>
>> On Fri, Dec 10, 2010 at 22:15, Luke Bloy <luke.bloy at gmail.com> wrote:
>>
>>> My problem is that i have a large number (~500,000)  of b vectors that I
>>> would like to find solutions for. My plan is to call KSPsolve repeatedly
>>> with each b. However I wonder if there are any solvers or approaches that
>>> might benefit from the fact that my A matrix does not change. Are there any
>>> decompositions that might still be sparse that would offer a speed up?
>>
>>
>> 1. What is the high-level problem you are trying to solve?  There might be
>> a better way.
>>
>>  I'm solving a diffusion problem. essentially I have 2,000,000 possible
>> states for my system to be in. The system evolves based on a markov matrix
>> M, which describes the probability the system moves from one state to
>> another. This matrix is extremely sparse on the < 100,000,000 nonzero
>> elements. The problem is to pump mass/energy into the system at certain
>> states. What I'm interested in is the steady state behavior of the system.
>>
>> basically the dynamics can be summarized as
>>
>> d_{t+1} = M d_{t} + d_i
>>
>> Where d_t is the state vector at time t and d_i shows the states I am
>> pumping energy into. I want to find d_t as t goes to infinity.
>>
>> My current approach is to solve the following system.
>>
>> (I-M) d = d_i
>>
>> I'm certainly open to any suggestions you might have.
>>
>>  2. If you can afford the memory, a direct solve probably makes sense.
>>
>>
>> My understanding is the inverses would generally be dense. I certainly
>> don't have any memory to hold a 2 million by 2 million dense matrix, I have
>> about 40G to play with. So perhaps a decomposition might work? Which might
>> you suggest?
>>
>
>  Try -pc_type lu -pc_mat_factor_package <mumps, superlu_dist> once you
> have reconfigured using
>
>    --download-superlu_dist --download-mumps
>
>  They are sparse LU factorization packages that might work.
>
>     Matt
>
>
>>  Thanks
>> Luke
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/ae52d78a/attachment.htm>

From luke.bloy at gmail.com  Sat Dec 11 00:49:49 2010
From: luke.bloy at gmail.com (Luke Bloy)
Date: Sat, 11 Dec 2010 01:49:49 -0500
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <AANLkTikxBg2adVCbSnWsLqBwJYyhYE3Y=+s3kyFBUWCY@mail.gmail.com>
References: <4D029757.6060708@seas.upenn.edu>
	<4D029861.8000508@gmail.com>	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>	<4D02B1C3.4040409@gmail.com>
	<AANLkTikxBg2adVCbSnWsLqBwJYyhYE3Y=+s3kyFBUWCY@mail.gmail.com>
Message-ID: <4D031F0D.9040401@gmail.com>


On 12/10/2010 06:30 PM, Jed Brown wrote:
> On Sat, Dec 11, 2010 at 00:03, Luke Bloy <luke.bloy at gmail.com 
> <mailto:luke.bloy at gmail.com>> wrote:
>
>     I'm solving a diffusion problem. essentially I have 2,000,000
>     possible states for my system to be in. The system evolves based
>     on a markov matrix M, which describes the probability the system
>     moves from one state to another. This matrix is extremely sparse
>     on the < 100,000,000 nonzero elements. The problem is to pump
>     mass/energy into the system at certain states. What I'm interested
>     in is the steady state behavior of the system.
>
>     basically the dynamics can be summarized as
>
>     d_{t+1} = M d_{t} + d_i
>
>     Where d_t is the state vector at time t and d_i shows the states I
>     am pumping energy into. I want to find d_t as t goes to infinity.
>
>     My current approach is to solve the following system.
>
>     (I-M) d = d_i
>
>
> So you want to do this for some 500,000 d_i?  What problem are you 
> really trying to solve?  Is it really to just brute-force compute 
> states for all these inputs?  What are you doing with the resulting 
> 500k states (all 8 terabytes of it)?  Are you, for example, looking 
> for some d_i that would change the steady state d in a certain way?
>
Yes I'd like do do this for roughly 500,000 d_i. I'm solving a diffusion 
problem, I only have local measures of the diffusion process which is 
what i use to determine that matrix M. Now the  500,000 d_i are the 
boundaries to my diffusion problem, what i need to know is who much of 
what gets pumped in though a given boundary state exits the system 
through the other boundaries. Do i really need to do all 500,000 
Probably not. this is the highest resolution mesh of the boundary i can 
compute from my data. A lower res mesh would probably be sufficient. But 
I wont know until i do it either way I'd like to use the highest res 
mesh that I can. If you can suggest an alternative approach I am all ears.

Luke

>>     2. If you can afford the memory, a direct solve probably makes sense.
>
>     My understanding is the inverses would generally be dense. I
>     certainly don't have any memory to hold a 2 million by 2 million
>     dense matrix, I have about 40G to play with. So perhaps a
>     decomposition might work? Which might you suggest?
>
>
> While inverses are almost always dense, sparse factorization is far 
> from dense.  For PDE problems factored in an optimal ordering, the 
> memory asymptotics are n*log n in 2D and n^{4/3} in 3D.  The time 
> asymptotics are n^{3/2} and n^2 respectively.  Compare to n^2 memory, 
> n^3 time for dense.
>
> Jed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/946e5b8b/attachment.htm>

From jed at 59A2.org  Sat Dec 11 01:20:55 2010
From: jed at 59A2.org (Jed Brown)
Date: Sat, 11 Dec 2010 08:20:55 +0100
Subject: [petsc-users] optimizing repeated calls to KSPsolve?
In-Reply-To: <4D031F0D.9040401@gmail.com>
References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com>
	<AANLkTi=j+NNNKQrqYQJcXVEBtEbSa5_7Q8EZL2LwnAw6@mail.gmail.com>
	<4D02B1C3.4040409@gmail.com>
	<AANLkTikxBg2adVCbSnWsLqBwJYyhYE3Y=+s3kyFBUWCY@mail.gmail.com>
	<4D031F0D.9040401@gmail.com>
Message-ID: <AANLkTi=rdXuJc2JkhHk0ypoyMnB1b994bcsE1nOZdyQg@mail.gmail.com>

What will you do with the 500000 responses?

Jed

On Dec 11, 2010 7:49 AM, "Luke Bloy" <luke.bloy at gmail.com> wrote:


On 12/10/2010 06:30 PM, Jed Brown wrote:
>
> On Sat, Dec 11, 2010 at 00:03, Luke Bloy <luke.bloy at g...
Yes I'd like do do this for roughly 500,000 d_i. I'm solving a diffusion
problem, I only have local measures of the diffusion process which is what i
use to determine that matrix M. Now the  500,000 d_i are the boundaries to
my diffusion problem, what i need to know is who much of what gets pumped in
though a given boundary state exits the system through the other boundaries.
Do i really need to do all 500,000 Probably not. this is the highest
resolution mesh of the boundary i can compute from my data. A lower res mesh
would probably be sufficient. But I wont know until i do it either way I'd
like to use the highest res mesh that I can. If you can suggest an
alternative approach I am all ears.

Luke


>>> 2. If you can afford the memory, a direct solve probably makes sense.
>>
>>
>> My understanding...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/114ccf5b/attachment.htm>

From jakub.pola at gmail.com  Sat Dec 11 08:32:18 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Sat, 11 Dec 2010 15:32:18 +0100
Subject: [petsc-users] MatMult
Message-ID: <1292077938.2074.38.camel@desktop>

Hello again,

I compiled one of te examples. I used sparse matix called 02-raefsky3.
I used -vec_type cuda and -mat_type seqaijcuda. 

When I see summary of the operations performed by program there is

MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
0  0  0   2100  0  0  0   147

Does time of performing MatMult includes memory transfer for loading
matrix in GPU memory or just exact computation time?

Thanks in advance. 
Kuba.


From jakub.pola at gmail.com  Sat Dec 11 09:36:46 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Sat, 11 Dec 2010 16:36:46 +0100
Subject: [petsc-users] Set number of iterations
Message-ID: <1292081806.2074.46.camel@desktop>

Hi again,

I was searching trought documentation but without success. I would like
to know how to set the number of iterations during the KSPSolve(). Is it
possible to set the iteration numbers to 100 for example?

Kuba


From filippo.spiga at disco.unimib.it  Sat Dec 11 10:05:55 2010
From: filippo.spiga at disco.unimib.it (Filippo Spiga)
Date: Sat, 11 Dec 2010 11:05:55 -0500
Subject: [petsc-users] Set number of iterations
In-Reply-To: <1292081806.2074.46.camel@desktop>
References: <1292081806.2074.46.camel@desktop>
Message-ID: <AANLkTikSPEGyn1CzStpG6f60jHEujtZKs-SOQ6JsFpYL@mail.gmail.com>

Specifying the option "-ksp_max_it 10000" after the name of the executable
or in the code using "KSPSetTolerances"

See:
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/KSP/KSPSetTolerances.html

Cheers
--
Filippo SPIGA, MSc Computer Science
~ homepage: http://tinyurl.com/fspiga ~

?Nobody will drive us out of Cantor's paradise.?
    -- David Hilbert

*****
Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an
may be privileged or otherwise protected from disclosure. The contents are
not to be disclosed to anyone other than the addressee. Unauthorized
recipients are requested to preserve this confidentiality and to advise the
sender immediately of any error in transmission."


On Sat, Dec 11, 2010 at 10:36 AM, Jakub Pola <jakub.pola at gmail.com> wrote:

> Hi again,
>
> I was searching trought documentation but without success. I would like
> to know how to set the number of iterations during the KSPSolve(). Is it
> possible to set the iteration numbers to 100 for example?
>
> Kuba
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101211/339d12ca/attachment-0001.htm>

From bsmith at mcs.anl.gov  Sat Dec 11 11:50:17 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 11 Dec 2010 11:50:17 -0600
Subject: [petsc-users] MatMult
In-Reply-To: <1292077938.2074.38.camel@desktop>
References: <1292077938.2074.38.camel@desktop>
Message-ID: <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>


   To answer this you need to understand that PETSc copies vectors and matrices to the GPU memory "on demand" (that is exactly when they are first needed on the GPU, and not before) and once it has copied to the GPU it keeps track of it and will NOT copy it down again if it is already there.

   Hence in your run below, yes it includes the copy time down. 

   But note that ONE multiply on the GPU is absurd, it does not make sense to copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do "sandalone" benchmarking where a single kernel is called by it self once, the time results are useless. Always run a FULL application with -log_summary; for example in this case a full KSPSolve() that requires a bunch of iterations. Then you can look at the performance of each kernel. The reason to do it this way is that the numbers can be very different and what matters is runs in APPLICATIONS so that is what should be measured.

   If say you run KSP with 20 iterations then the time to copy the matrix down to the GPU is amortized over those 20 iterations and thus maybe ok. You should see the flop rate for the MatMult() go up in this case.

   You may have noticed we have a log entry for VecCopyToGPU() we will be adding one for matrices as well thus you will be able to see how long the copy time is but not that the copy time is still counted in the MatMult() time if the first copy of the matrix to GPU is triggered by the MatMult. You can subtract the copy time from the mult time to get the per multiply time, this would correspond to the multiply time in the limit of a single copy down and many, many multiplies on the GPU.

   Barry


On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:

> Hello again,
> 
> I compiled one of te examples. I used sparse matix called 02-raefsky3.
> I used -vec_type cuda and -mat_type seqaijcuda. 
> 
> When I see summary of the operations performed by program there is
> 
> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> 0  0  0   2100  0  0  0   147
> 
> Does time of performing MatMult includes memory transfer for loading
> matrix in GPU memory or just exact computation time?
> 
> Thanks in advance. 
> Kuba.
> 


From jakub.pola at gmail.com  Sat Dec 11 12:08:49 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Sat, 11 Dec 2010 19:08:49 +0100
Subject: [petsc-users] MatMult
In-Reply-To: <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
References: <1292077938.2074.38.camel@desktop>
	<05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
Message-ID: <1292090930.2074.128.camel@desktop>

Thank you very much for you answer. That helps me a lot.

Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze:
> To answer this you need to understand that PETSc copies vectors and matrices to the GPU memory "on demand" (that is exactly when they are first needed on the GPU, and not before) and once it has copied to the GPU it keeps track of it and will NOT copy it down again if it is already there.
> 
>    Hence in your run below, yes it includes the copy time down. 
> 
>    But note that ONE multiply on the GPU is absurd, it does not make sense to copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do "sandalone" benchmarking where a single kernel is called by it self once, the time results are useless. Always run a FULL application with -log_summary; for example in this case a full KSPSolve() that requires a bunch of iterations. Then you can look at the performance of each kernel. The reason to do it this way is that the numbers can be very different and what matters is runs in APPLICATIONS so that is what should be measured.
> 
>    If say you run KSP with 20 iterations then the time to copy the matrix down to the GPU is amortized over those 20 iterations and thus maybe ok. You should see the flop rate for the MatMult() go up in this case.
> 
>    You may have noticed we have a log entry for VecCopyToGPU() we will be adding one for matrices as well thus you will be able to see how long the copy time is but not that the copy time is still counted in the MatMult() time if the first copy of the matrix to GPU is triggered by the MatMult. You can subtract the copy time from the mult time to get the per multiply time, this would correspond to the multiply time in the limit of a single copy down and many, many multiplies on the GPU.
> 
>    Barry
> 
> 
> 
> 
> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
> 
> > Hello again,
> > 
> > I compiled one of te examples. I used sparse matix called 02-raefsky3.
> > I used -vec_type cuda and -mat_type seqaijcuda. 
> > 
> > When I see summary of the operations performed by program there is
> > 
> > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> > 0  0  0   2100  0  0  0   147
> > 
> > Does time of performing MatMult includes memory transfer for loading
> > matrix in GPU memory or just exact computation time?
> > 
> > Thanks in advance. 
> > Kuba.
> > 
> 


From jakub.pola at gmail.com  Sun Dec 12 14:14:09 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Sun, 12 Dec 2010 21:14:09 +0100
Subject: [petsc-users] Create csr matrix
Message-ID: <1292184849.8638.4.camel@desktop>

Hi, 

Could you please help me with creating CSR matrix. I have only one
processor so It have to be done locally

Here is the matrix from those information I would like to have matrix
petsc matrix.

double  vals  [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0} ;
int     c_idx [] = {1,   3,   2,   0,   1,   2,   2,   3} ;
int     r_idx [] = {0, 2, 3, 6, 8} ;
int     n_rows   = 4 ; //square matrix

Thank you in advance for help.
Kuba


From bsmith at mcs.anl.gov  Sun Dec 12 14:30:04 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 12 Dec 2010 14:30:04 -0600
Subject: [petsc-users] Create csr matrix
In-Reply-To: <1292184849.8638.4.camel@desktop>
References: <1292184849.8638.4.camel@desktop>
Message-ID: <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov>


  MatCreateSeqAIJWithArrays()

On Dec 12, 2010, at 2:14 PM, Jakub Pola wrote:

> Hi, 
> 
> Could you please help me with creating CSR matrix. I have only one
> processor so It have to be done locally
> 
> Here is the matrix from those information I would like to have matrix
> petsc matrix.
> 
> double  vals  [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0} ;
> int     c_idx [] = {1,   3,   2,   0,   1,   2,   2,   3} ;
> int     r_idx [] = {0, 2, 3, 6, 8} ;
> int     n_rows   = 4 ; //square matrix
> 
> Thank you in advance for help.
> Kuba
> 
> 
> 
> 


From jakub.pola at gmail.com  Sun Dec 12 14:41:38 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Sun, 12 Dec 2010 21:41:38 +0100
Subject: [petsc-users] Create csr matrix
In-Reply-To: <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov>
References: <1292184849.8638.4.camel@desktop>
	<2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov>
Message-ID: <1292186498.8638.7.camel@desktop>

Thanks, 

Is there also so easy way to extract those arrays from already created
matrix?
I created matrix A with function:
MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, n_rows, n_rows, r_idx, c_idx,
vals, A);

Now I would like to extract all tables from matrix A;


Dnia 2010-12-12, nie o godzinie 14:30 -0600, Barry Smith pisze:
> MatCreateSeqAIJWithArrays


From bsmith at mcs.anl.gov  Sun Dec 12 14:58:16 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 12 Dec 2010 14:58:16 -0600
Subject: [petsc-users] Create csr matrix
In-Reply-To: <1292186498.8638.7.camel@desktop>
References: <1292184849.8638.4.camel@desktop>
	<2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov>
	<1292186498.8638.7.camel@desktop>
Message-ID: <FE9580C4-4015-490E-ADA5-6C4872F35154@mcs.anl.gov>


On Dec 12, 2010, at 2:41 PM, Jakub Pola wrote:

> Thanks, 
> 
> Is there also so easy way to extract those arrays from already created
> matrix?

  Not particularly because we do not like the idea of directly manipulating the storage details of a particular sparse matrix type. But you can use MatGetRowIJ()/MatRestoreRowIJ() and MatGetArray() if you really want to.


> I created matrix A with function:
> MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, n_rows, n_rows, r_idx, c_idx,
> vals, A);

> 
> Now I would like to extract all tables from matrix A;

   What is a "table" of a matrix A?

   Barry

> 
> 
> Dnia 2010-12-12, nie o godzinie 14:30 -0600, Barry Smith pisze:
>> MatCreateSeqAIJWithArrays
> 


From jakub.pola at gmail.com  Sun Dec 12 15:15:08 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Sun, 12 Dec 2010 22:15:08 +0100
Subject: [petsc-users] Create csr matrix
In-Reply-To: <FE9580C4-4015-490E-ADA5-6C4872F35154@mcs.anl.gov>
References: <1292184849.8638.4.camel@desktop>
	<2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov>
	<1292186498.8638.7.camel@desktop>
	<FE9580C4-4015-490E-ADA5-6C4872F35154@mcs.anl.gov>
Message-ID: <1292188508.8638.14.camel@desktop>

The reason I want to perform this operation is that I have CG solver
based on GPU. It takes arguments in the way I have written so 

double  vals  [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0} ;
int     c_idx [] = {1,   3,   2,   0,   1,   2,   2,   3} ;
int     r_idx [] = {0, 2, 3, 6, 8} ;
int     n_rows   = 4 ; //square matrix

That will help me a lot with testing. I can load matrix using PETSC then
I would like to extract this matrix to those arrays.

Dnia 2010-12-12, nie o godzinie 14:58 -0600, Barry Smith pisze:
> On Dec 12, 2010, at 2:41 PM, Jakub Pola wrote:
> 
> > Thanks, 
> > 
> > Is there also so easy way to extract those arrays from already created
> > matrix?
> 
>   Not particularly because we do not like the idea of directly manipulating the storage details of a particular sparse matrix type. But you can use MatGetRowIJ()/MatRestoreRowIJ() and MatGetArray() if you really want to.
> 
> 
> > I created matrix A with function:
> > MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, n_rows, n_rows, r_idx, c_idx,
> > vals, A);
> 
> > 
> > Now I would like to extract all tables from matrix A;
> 
>    What is a "table" of a matrix A?
> 
>    Barry
> 
> > 
> > 
> > Dnia 2010-12-12, nie o godzinie 14:30 -0600, Barry Smith pisze:
> >> MatCreateSeqAIJWithArrays
> > 
> 


From filippo.spiga at disco.unimib.it  Mon Dec 13 00:35:30 2010
From: filippo.spiga at disco.unimib.it (Filippo Spiga)
Date: Mon, 13 Dec 2010 01:35:30 -0500
Subject: [petsc-users] About the "PetscOptionsSetValue" usage
In-Reply-To: <AANLkTik3C2au_XaYuVUq-XkSF=r_tCcXFyP2nLd5KjV1@mail.gmail.com>
References: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>
	<AANLkTimhBMCxZODc=Q0YdJYzQb7=AZOB085MD4XjCLJd@mail.gmail.com>
	<AANLkTikhFLt4TjWnWgW=OL9_LXVVbQhXyUtFyiP9JEtG@mail.gmail.com>
	<AANLkTik3C2au_XaYuVUq-XkSF=r_tCcXFyP2nLd5KjV1@mail.gmail.com>
Message-ID: <AANLkTikkFz4wkAMv3WC8UjUtOquXXNqHQra21vNHnNRW@mail.gmail.com>

I was referring to the input file of my application, I do not want to pass
PETSc option by the command line or using another specific file input.

Anyway, I tried the strategy "KSPSetOptionsPrefix(..)"/"SNES
SetOptionsPrefix(...);" + "PetscOptionsSetValue(...)" and it works very
well!!! But now I have another question. Using these routines, every time
I cyclically call KSP and/or SNES in my program I put new options inside the
PETSc option database. Most of these options, that have the same prefix, are
replicated.  Is this a problem? Is it always true that I will use the last
one inserted?


Let's consider this example:

begin program
  begin program iteration number 1
     CreateKSP(myksp1);
     PetscOptionsSetValue("-a", "1");
     PetscOptionsSetValue("-b", "2");
     KSPSolve(myksp1)
     DestroyKSP(myksp1);
  end program iteration number 1

  begin program iteration number 2
     CreateKSP(myksp2);
     PetscOptionsSetValue("-a", "5");
     PetscOptionsSetValue("-c", "5");
     KSPSolve(myksp2);
     DestroyKSP(myksp2);
  end program iteration number 2

  begin program iteration number 3
     CreateKSP(myksp3);
     PetscOptionsSetValue("-a", "10");
     PetscOptionsSetValue("-b", "20");
     KESPSolve(myksp3);
     DestroyKSP(myksp3);
   end program iteration number 3
end program

In the iteration 2 appears the option "-c", what can I do to remove the
option "-b" inside the internal PETSc option database?
And similar, in the iteration 3 the option "-c" disappears and the option
"-b" returns with a different value. What can I do to remove the option "-c"
inside the internal PETSc option database?

Is there a routine that reset/empty completely the internal PETSc database
option without terminate the program? Or the only way is to assign a
different prefix for every program iteration?

Thanks very much in advance,
Cheers!

--
Filippo SPIGA, MSc Computer Science
~ homepage: http://tinyurl.com/fspiga ~

?Nobody will drive us out of Cantor's paradise.?
    -- David Hilbert

*****
Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an
may be privileged or otherwise protected from disclosure. The contents are
not to be disclosed to anyone other than the addressee. Unauthorized
recipients are requested to preserve this confidentiality and to advise the
sender immediately of any error in transmission."


On Fri, Dec 10, 2010 at 2:09 PM, Jed Brown <jed at 59a2.org> wrote:

> On Fri, Dec 10, 2010 at 19:54, Filippo Spiga <
> filippo.spiga at disco.unimib.it> wrote:
>
>> I know that it is possible to use API to set parameters (KSPSetType,
>> SNESSetType). But a lot of options of HYPRE or SUPERLU for example have no
>> API. Instead of mix option from command-line and API I would like to put
>> everything in a config file
>
>
> You might be interested in the -options_file command line option and
> PetscOptionsInsertFile.  Also, any options present in
>
> ~/.petscrc
> .petscrc (in current directory)
> petscrc (in current directory)
>
> get slurped in automatically, as well as the string in the PETSC_OPTIONS
> environment variable.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/dc21bf3f/attachment-0001.htm>

From jed at 59A2.org  Mon Dec 13 00:43:26 2010
From: jed at 59A2.org (Jed Brown)
Date: Sun, 12 Dec 2010 22:43:26 -0800
Subject: [petsc-users] About the "PetscOptionsSetValue" usage
In-Reply-To: <AANLkTikkFz4wkAMv3WC8UjUtOquXXNqHQra21vNHnNRW@mail.gmail.com>
References: <AANLkTimS5KWO6LXLBkaThYPDkKKkZND-Y0tX0U8ho=S6@mail.gmail.com>
	<AANLkTimhBMCxZODc=Q0YdJYzQb7=AZOB085MD4XjCLJd@mail.gmail.com>
	<AANLkTikhFLt4TjWnWgW=OL9_LXVVbQhXyUtFyiP9JEtG@mail.gmail.com>
	<AANLkTik3C2au_XaYuVUq-XkSF=r_tCcXFyP2nLd5KjV1@mail.gmail.com>
	<AANLkTikkFz4wkAMv3WC8UjUtOquXXNqHQra21vNHnNRW@mail.gmail.com>
Message-ID: <AANLkTinqexEw09g+UpCq9CTbS5cHMGjC81D8n6WLmdKO@mail.gmail.com>

On Sun, Dec 12, 2010 at 22:35, Filippo Spiga
<filippo.spiga at disco.unimib.it>wrote:

> What can I do to remove the option "-c" inside the internal PETSc option
> database?
>

PetscOptionsClearValue()


> Is there a routine that reset/empty completely the internal PETSc database
> option without terminate the program?
>

PetscOptionsClear()


> Or the only way is to assign a different prefix for every program
> iteration?
>

What do you have to change differently on each iteration, but want to read
from a config file?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101212/a1b94c3f/attachment.htm>

From jakub.pola at gmail.com  Mon Dec 13 01:29:16 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Mon, 13 Dec 2010 08:29:16 +0100
Subject: [petsc-users] MatMult
In-Reply-To: <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
References: <1292077938.2074.38.camel@desktop>
	<05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
Message-ID: <1292225356.1803.4.camel@desktop>

Hi,

Does MatMult function is performed on GPU? when I prepared program which
just executes this function with parameters -vec_type cuda and -mat_type
seqaijcuda i havent seen in summary log any VecCUDACopyTo entry


Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze:
> To answer this you need to understand that PETSc copies vectors and matrices to the GPU memory "on demand" (that is exactly when they are first needed on the GPU, and not before) and once it has copied to the GPU it keeps track of it and will NOT copy it down again if it is already there.
> 
>    Hence in your run below, yes it includes the copy time down. 
> 
>    But note that ONE multiply on the GPU is absurd, it does not make sense to copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do "sandalone" benchmarking where a single kernel is called by it self once, the time results are useless. Always run a FULL application with -log_summary; for example in this case a full KSPSolve() that requires a bunch of iterations. Then you can look at the performance of each kernel. The reason to do it this way is that the numbers can be very different and what matters is runs in APPLICATIONS so that is what should be measured.
> 
>    If say you run KSP with 20 iterations then the time to copy the matrix down to the GPU is amortized over those 20 iterations and thus maybe ok. You should see the flop rate for the MatMult() go up in this case.
> 
>    You may have noticed we have a log entry for VecCopyToGPU() we will be adding one for matrices as well thus you will be able to see how long the copy time is but not that the copy time is still counted in the MatMult() time if the first copy of the matrix to GPU is triggered by the MatMult. You can subtract the copy time from the mult time to get the per multiply time, this would correspond to the multiply time in the limit of a single copy down and many, many multiplies on the GPU.
> 
>    Barry
> 
> 
> 
> 
> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
> 
> > Hello again,
> > 
> > I compiled one of te examples. I used sparse matix called 02-raefsky3.
> > I used -vec_type cuda and -mat_type seqaijcuda. 
> > 
> > When I see summary of the operations performed by program there is
> > 
> > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> > 0  0  0   2100  0  0  0   147
> > 
> > Does time of performing MatMult includes memory transfer for loading
> > matrix in GPU memory or just exact computation time?
> > 
> > Thanks in advance. 
> > Kuba.
> > 
> 


From knepley at gmail.com  Mon Dec 13 01:37:00 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 13 Dec 2010 07:37:00 +0000
Subject: [petsc-users] MatMult
In-Reply-To: <1292225356.1803.4.camel@desktop>
References: <1292077938.2074.38.camel@desktop>
	<05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
	<1292225356.1803.4.camel@desktop>
Message-ID: <AANLkTi=bFSj+_47J-BFyM+D=-Y2yib1Fdqh+zNfADeSM@mail.gmail.com>

Yes, it should run on the GPU. Check an example, like ex19.

   Matt

On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com> wrote:

> Hi,
>
> Does MatMult function is performed on GPU? when I prepared program which
> just executes this function with parameters -vec_type cuda and -mat_type
> seqaijcuda i havent seen in summary log any VecCUDACopyTo entry
>
>
> Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze:
> > To answer this you need to understand that PETSc copies vectors and
> matrices to the GPU memory "on demand" (that is exactly when they are first
> needed on the GPU, and not before) and once it has copied to the GPU it
> keeps track of it and will NOT copy it down again if it is already there.
> >
> >    Hence in your run below, yes it includes the copy time down.
> >
> >    But note that ONE multiply on the GPU is absurd, it does not make
> sense to copy a matrix down to the GPU and then do ONE multiply with it.
> Thus I NEVER do "sandalone" benchmarking where a single kernel is called by
> it self once, the time results are useless. Always run a FULL application
> with -log_summary; for example in this case a full KSPSolve() that requires
> a bunch of iterations. Then you can look at the performance of each kernel.
> The reason to do it this way is that the numbers can be very different and
> what matters is runs in APPLICATIONS so that is what should be measured.
> >
> >    If say you run KSP with 20 iterations then the time to copy the matrix
> down to the GPU is amortized over those 20 iterations and thus maybe ok. You
> should see the flop rate for the MatMult() go up in this case.
> >
> >    You may have noticed we have a log entry for VecCopyToGPU() we will be
> adding one for matrices as well thus you will be able to see how long the
> copy time is but not that the copy time is still counted in the MatMult()
> time if the first copy of the matrix to GPU is triggered by the MatMult. You
> can subtract the copy time from the mult time to get the per multiply time,
> this would correspond to the multiply time in the limit of a single copy
> down and many, many multiplies on the GPU.
> >
> >    Barry
> >
> >
> >
> >
> > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
> >
> > > Hello again,
> > >
> > > I compiled one of te examples. I used sparse matix called 02-raefsky3.
> > > I used -vec_type cuda and -mat_type seqaijcuda.
> > >
> > > When I see summary of the operations performed by program there is
> > >
> > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> > > 0  0  0   2100  0  0  0   147
> > >
> > > Does time of performing MatMult includes memory transfer for loading
> > > matrix in GPU memory or just exact computation time?
> > >
> > > Thanks in advance.
> > > Kuba.
> > >
> >
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/4e839060/attachment.htm>

From jakub.pola at gmail.com  Mon Dec 13 02:20:57 2010
From: jakub.pola at gmail.com (Jakub Pola)
Date: Mon, 13 Dec 2010 09:20:57 +0100
Subject: [petsc-users] MatMult
In-Reply-To: <AANLkTi=bFSj+_47J-BFyM+D=-Y2yib1Fdqh+zNfADeSM@mail.gmail.com>
References: <1292077938.2074.38.camel@desktop>
	<05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
	<1292225356.1803.4.camel@desktop>
	<AANLkTi=bFSj+_47J-BFyM+D=-Y2yib1Fdqh+zNfADeSM@mail.gmail.com>
Message-ID: <1292228457.1803.34.camel@desktop>

Could you please check the file attached to this email. there is source
code and log summary from execution of mat mult.

When I run the ex131 with parameters -vec_type cuda and -mat_type
seqaijcuda 

mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
-vec_type cuda -log_summary

it fails because of CUDA Error 4. see MatMultKO.log


When I run the same program without -vec_type cuda parameter only with
-mat_type seqaijcuda it run ok.
mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
-log_summary

MatMltOK.log

When I run without -math_type seqaijcuda only with -vec_type cuda it
fails again because

terminate called after throwing an instance of
'thrust::system::system_error'
  what():  invalid argument
terminate called after throwing an instance of
'thrust::system::system_error'
  what():  invalid argument
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 3755 on node desktop exited
on signal 6 (Aborted).
--------------------------------------------------------------------------


Could you please give me some comments on that

Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze:
> Yes, it should run on the GPU. Check an example, like ex19.
> 
> 
>    Matt
> 
> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com>
> wrote:
>         Hi,
>         
>         Does MatMult function is performed on GPU? when I prepared
>         program which
>         just executes this function with parameters -vec_type cuda and
>         -mat_type
>         seqaijcuda i havent seen in summary log any VecCUDACopyTo
>         entry
>         
>         
>         Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith
>         pisze:
>         
>         
>         > To answer this you need to understand that PETSc copies
>         vectors and matrices to the GPU memory "on demand" (that is
>         exactly when they are first needed on the GPU, and not before)
>         and once it has copied to the GPU it keeps track of it and
>         will NOT copy it down again if it is already there.
>         >
>         >    Hence in your run below, yes it includes the copy time
>         down.
>         >
>         >    But note that ONE multiply on the GPU is absurd, it does
>         not make sense to copy a matrix down to the GPU and then do
>         ONE multiply with it. Thus I NEVER do "sandalone" benchmarking
>         where a single kernel is called by it self once, the time
>         results are useless. Always run a FULL application with
>         -log_summary; for example in this case a full KSPSolve() that
>         requires a bunch of iterations. Then you can look at the
>         performance of each kernel. The reason to do it this way is
>         that the numbers can be very different and what matters is
>         runs in APPLICATIONS so that is what should be measured.
>         >
>         >    If say you run KSP with 20 iterations then the time to
>         copy the matrix down to the GPU is amortized over those 20
>         iterations and thus maybe ok. You should see the flop rate for
>         the MatMult() go up in this case.
>         >
>         >    You may have noticed we have a log entry for
>         VecCopyToGPU() we will be adding one for matrices as well thus
>         you will be able to see how long the copy time is but not that
>         the copy time is still counted in the MatMult() time if the
>         first copy of the matrix to GPU is triggered by the MatMult.
>         You can subtract the copy time from the mult time to get the
>         per multiply time, this would correspond to the multiply time
>         in the limit of a single copy down and many, many multiplies
>         on the GPU.
>         >
>         >    Barry
>         >
>         >
>         >
>         >
>         > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
>         >
>         > > Hello again,
>         > >
>         > > I compiled one of te examples. I used sparse matix called
>         02-raefsky3.
>         > > I used -vec_type cuda and -mat_type seqaijcuda.
>         > >
>         > > When I see summary of the operations performed by program
>         there is
>         > >
>         > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00
>         0.0e+00  2100
>         > > 0  0  0   2100  0  0  0   147
>         > >
>         > > Does time of performing MatMult includes memory transfer
>         for loading
>         > > matrix in GPU memory or just exact computation time?
>         > >
>         > > Thanks in advance.
>         > > Kuba.
>         > >
>         >
>         
>         
>         
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tests.zip
Type: application/zip
Size: 4031 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/3d1963ed/attachment.zip>

From maxwindiff at gmail.com  Mon Dec 13 04:30:11 2010
From: maxwindiff at gmail.com (Max Ng)
Date: Mon, 13 Dec 2010 18:30:11 +0800
Subject: [petsc-users] run direct linear solver in parallel
Message-ID: <AANLkTimMcz5mdKgGkwO4tbMoYabaE+CUEvpkbPo9W7_X@mail.gmail.com>

Hi,

I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use
SPOOLES because I need to build on Windows with VC++ (and without a Fortran
compiler). And in my tests somehow SPOOLES performs better than SuperLU.

My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I got
this error:

Assertion failed in file helper_fns.c at line 337: 0
memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 len_=4

internal ABORT - process 1
Assertion failed in file helper_fns.c at line 337: 0
memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 len_=4

internal ABORT - process 0
rank 1 in job 113  vm1_57881   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9

Here is the source code:

            // N = 40000, n = 20000, nnz = 9
            //
            MatCreate(comm, &mat);
            MatSetType(mat, MATAIJ);
            MatSetSizes(mat, n, n, N, N);
            MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL);
            MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz,
PETSC_NULL);

            // some code to fill the matrix values
            // ...

            KSPCreate(comm, &ksp);
            KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN);
            KSPSetType(ksp, KSPPREONLY);

            KSPGetPC(ksp, &pc);
            PCSetType(pc, PCLU);
            PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES);

            KSPSetUp(ksp);

It crashes at the KSPSetUp() statement.

Do you have any ideas? Thanks in advance!

Max Ng

On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote:


> Hi everyone,> > I am wondering how I can run the direct solver in parallel. I can run> my program in a single processor with direct linear solver by> > ./foo.out  -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles> > However, when I try to run it with mpi:> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu> -pc_factor_mat_solver_package spooles> > I got error like this:> > [0]PETSC ERROR: --------------------- Error Message> ------------------------------------> [0]PETSC ERROR: No support for this operation for this object type!> [0]PETSC ERROR: Matrix type mpiaij  symbolic LU!> > [0]PETSC ERROR: Libraries linked from> /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt> [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008> [0]PETSC ERROR: Configure options --with-shared --with-dynamic> --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi> --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack> --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse> --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"> --with-superlu=1 --with-superlu-include=/usr/include/superlu> --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1> --with-spooles-include=/usr/include/spooles> --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1> --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr> [0]PETSC ERROR:> ------------------------------------------------------------------------> [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c> [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c> -------------------------------------------------------> > Would you like to tell me where I am doing wrong? I appreciate your help.> > Xiangdong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/4299767d/attachment-0001.htm>

From bsmith at mcs.anl.gov  Mon Dec 13 08:34:56 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 13 Dec 2010 08:34:56 -0600
Subject: [petsc-users] run direct linear solver in parallel
In-Reply-To: <AANLkTimMcz5mdKgGkwO4tbMoYabaE+CUEvpkbPo9W7_X@mail.gmail.com>
References: <AANLkTimMcz5mdKgGkwO4tbMoYabaE+CUEvpkbPo9W7_X@mail.gmail.com>
Message-ID: <3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov>


   The problem is not in PETSc.    Run in the debugger  and see exactly where this memcpy() overlap happens and if it can be fixed.

  Barry


On Dec 13, 2010, at 4:30 AM, Max Ng wrote:

> Hi,
> 
> I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use SPOOLES because I need to build on Windows with VC++ (and without a Fortran compiler). And in my tests somehow SPOOLES performs better than SuperLU.
> 
> My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I got this error:
> 
> Assertion failed in file helper_fns.c at line 337: 0
> memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 len_=4
> 
> internal ABORT - process 1
> Assertion failed in file helper_fns.c at line 337: 0
> memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 len_=4
> 
> internal ABORT - process 0
> rank 1 in job 113  vm1_57881   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9 
> 
> Here is the source code:
> 
>             // N = 40000, n = 20000, nnz = 9
>             //
>             MatCreate(comm, &mat);
>             MatSetType(mat, MATAIJ);
>             MatSetSizes(mat, n, n, N, N);
>             MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL);
>             MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz, PETSC_NULL);
> 
>             // some code to fill the matrix values
>             // ...
> 
>             KSPCreate(comm, &ksp);
>             KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN);
>             KSPSetType(ksp, KSPPREONLY);
> 
>             KSPGetPC(ksp, &pc);
>             PCSetType(pc, PCLU);
>             PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES);
> 
>             KSPSetUp(ksp);
> 
> It crashes at the KSPSetUp() statement.
> 
> Do you have any ideas? Thanks in advance!
> 
> Max Ng
> 
> On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote:
> 
>> > Hi everyone,
>> > 
>> > I am wondering how I can run the direct solver in parallel. I can run
>> > my program in a single processor with direct linear solver by
>> > 
>> > ./foo.out  -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles
>> > 
>> > However, when I try to run it with mpi:
>> > 
>> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu
>> > -pc_factor_mat_solver_package spooles
>> > 
>> > I got error like this:
>> > 
>> > [0]PETSC ERROR: --------------------- Error Message
>> > ------------------------------------
>> > [0]PETSC ERROR: No support for this operation for this object type!
>> > [0]PETSC ERROR: Matrix type mpiaij  symbolic LU!
>> > 
>> > [0]PETSC ERROR: Libraries linked from
>> > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
>> > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008
>> > [0]PETSC ERROR: Configure options --with-shared --with-dynamic
>> > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi
>> > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack
>> > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
>> > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
>> > --with-superlu=1 --with-superlu-include=/usr/include/superlu
>> > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
>> > --with-spooles-include=/usr/include/spooles
>> > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
>> > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr
>> > [0]PETSC ERROR:
>> > ------------------------------------------------------------------------
>> > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c
>> > [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c
>> > -------------------------------------------------------
>> > 
>> > Would you like to tell me where I am doing wrong? I appreciate your help.
>> > 
>> > Xiangdong
> 


From hzhang at mcs.anl.gov  Mon Dec 13 09:00:51 2010
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 13 Dec 2010 09:00:51 -0600
Subject: [petsc-users] run direct linear solver in parallel
In-Reply-To: <3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov>
References: <AANLkTimMcz5mdKgGkwO4tbMoYabaE+CUEvpkbPo9W7_X@mail.gmail.com>
	<3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov>
Message-ID: <AANLkTi=MZ+LK9gFpBJSfBzQfus1HYU2_eH=4NZZc3mO8@mail.gmail.com>

Max,
Does superlu_dist crash?
Spooles has been out of support from its developers for more than 10 years.
For small testing problems, it can be faster.

Mumps is a good and robust direct solver we usually recommend, but it
requires f90.

Hong

On Mon, Dec 13, 2010 at 8:34 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ? The problem is not in PETSc. ? ?Run in the debugger ?and see exactly where this memcpy() overlap happens and if it can be fixed.
>
> ?Barry
>
>
> On Dec 13, 2010, at 4:30 AM, Max Ng wrote:
>
>> Hi,
>>
>> I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use SPOOLES because I need to build on Windows with VC++ (and without a Fortran compiler). And in my tests somehow SPOOLES performs better than SuperLU.
>>
>> My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I got this error:
>>
>> Assertion failed in file helper_fns.c at line 337: 0
>> memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 len_=4
>>
>> internal ABORT - process 1
>> Assertion failed in file helper_fns.c at line 337: 0
>> memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 len_=4
>>
>> internal ABORT - process 0
>> rank 1 in job 113 ?vm1_57881 ? caused collective abort of all ranks
>> ? exit status of rank 1: killed by signal 9
>>
>> Here is the source code:
>>
>> ? ? ? ? ? ? // N = 40000, n = 20000, nnz = 9
>> ? ? ? ? ? ? //
>> ? ? ? ? ? ? MatCreate(comm, &mat);
>> ? ? ? ? ? ? MatSetType(mat, MATAIJ);
>> ? ? ? ? ? ? MatSetSizes(mat, n, n, N, N);
>> ? ? ? ? ? ? MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL);
>> ? ? ? ? ? ? MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz, PETSC_NULL);
>>
>> ? ? ? ? ? ? // some code to fill the matrix values
>> ? ? ? ? ? ? // ...
>>
>> ? ? ? ? ? ? KSPCreate(comm, &ksp);
>> ? ? ? ? ? ? KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN);
>> ? ? ? ? ? ? KSPSetType(ksp, KSPPREONLY);
>>
>> ? ? ? ? ? ? KSPGetPC(ksp, &pc);
>> ? ? ? ? ? ? PCSetType(pc, PCLU);
>> ? ? ? ? ? ? PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES);
>>
>> ? ? ? ? ? ? KSPSetUp(ksp);
>>
>> It crashes at the KSPSetUp() statement.
>>
>> Do you have any ideas? Thanks in advance!
>>
>> Max Ng
>>
>> On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote:
>>
>>> > Hi everyone,
>>> >
>>> > I am wondering how I can run the direct solver in parallel. I can run
>>> > my program in a single processor with direct linear solver by
>>> >
>>> > ./foo.out ?-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>> >
>>> > However, when I try to run it with mpi:
>>> >
>>> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu
>>> > -pc_factor_mat_solver_package spooles
>>> >
>>> > I got error like this:
>>> >
>>> > [0]PETSC ERROR: --------------------- Error Message
>>> > ------------------------------------
>>> > [0]PETSC ERROR: No support for this operation for this object type!
>>> > [0]PETSC ERROR: Matrix type mpiaij ?symbolic LU!
>>> >
>>> > [0]PETSC ERROR: Libraries linked from
>>> > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
>>> > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008
>>> > [0]PETSC ERROR: Configure options --with-shared --with-dynamic
>>> > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi
>>> > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack
>>> > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
>>> > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
>>> > --with-superlu=1 --with-superlu-include=/usr/include/superlu
>>> > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
>>> > --with-spooles-include=/usr/include/spooles
>>> > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
>>> > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr
>>> > [0]PETSC ERROR:
>>> > ------------------------------------------------------------------------
>>> > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c
>>> > [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c
>>> > -------------------------------------------------------
>>> >
>>> > Would you like to tell me where I am doing wrong? I appreciate your help.
>>> >
>>> > Xiangdong
>>
>
>

From u.tabak at tudelft.nl  Mon Dec 13 09:11:41 2010
From: u.tabak at tudelft.nl (Umut Tabak)
Date: Mon, 13 Dec 2010 16:11:41 +0100
Subject: [petsc-users] class templates in C++ and Petsc functions
Message-ID: <4D0637AD.6060206@tudelft.nl>

Dear all,

I was trying to write a simple class template code to interface Petsc 
matrices along with Boost sparse matrices and vice versa. However, my 
first naive try did not work out of the box, since the template class 
requires some functions from PETSc libraries and these functions should 
be located in the object files to be able to compile. Since class 
template is header only and there is no object code generation, it can 
not find the PETSc library functions which is logical. I am also new to 
templates in C++. I just gave a try to create a template class instead 
of code duplication however it did not end up as expected. Are there 
some smarter ways to accomplish this task. The function analyzes a boost 
csr matrix and sets the rows of a Petsc matrix. T is for boost matrices
and T1 for Mat in Petsc. I can also get some comments on the code ;)

Best regards,
Umut

template <class T, class T1>
int converter<T, T1>::convertMBo2Pe( const T& A,
                                                                T1 A_ ){
     PetscErrorCode ierr;
     int cntNnz = 0;
     typedef typename T::iterator1 i1_t;
     typedef typename T::iterator2 i2_t;
     //int nnz[ooFelieMatrix.size1()];
     int nnz[A.size1()];
     unsigned ind=0;
     //get information about the matrix

     double* vals = NULL;
     for (i1_t i1 = A.begin1(); i1 != A.end1(); ++i1)
     {
       nnz[ind] = distance(i1.begin(), i1.end());
       ind++;
     }
     // create the matrix depending
     // on the values of the nonzeros
     // on each row
     ierr = MatCreateSeqAIJ( PETSC_COMM_SELF, A.size1(),
                                                   A.size2(), cntNnz, nnz,
                                                   A_ );
     PetscInt rInd = 0, cInd=0;
     PetscInt* rCount, dummy;
     rCount = &dummy;
     // pointer to values in a row
     PetscScalar*   valsOfRowI = NULL;
     PetscInt*  colIndexOfRowI = NULL;
     PetscInt rC = 1;
     for(i1_t i1 = A.begin1(); i1 != A.end1(); ++i1)
     {
       // allocate space for the values of row I
       valsOfRowI     = new PetscScalar[nnz[rInd]];
       colIndexOfRowI = new PetscInt[nnz[rInd]];
       for(i2_t i2 = i1.begin(); i2 != i1.end(); ++i2)
       {
          colIndexOfRowI[cInd] = i2.index2();
          valsOfRowI[cInd]     = *i2;
          cInd++;
       }
       // setting one row each time
       *rCount = rInd;
       MatSetValues( A_, rC, rCount, nnz[rInd],
                                   colIndexOfRowI, valsOfRowI,
                   INSERT_VALUES );
       // delete
       delete [] valsOfRowI;
       delete [] colIndexOfRowI;
       rInd++; cInd = 0;
     }
     //
     MatAssemblyBegin( A_, MAT_FINAL_ASSEMBLY );
     MatAssemblyEnd( A_, MAT_FINAL_ASSEMBLY );
     // return
     return 0;
}

-- 
  - Hope is a good thing, maybe the best of things
    and no good thing ever dies...
  The Shawshank Redemption, replique of Tim Robbins


From maxwindiff at gmail.com  Mon Dec 13 09:12:46 2010
From: maxwindiff at gmail.com (Max Ng)
Date: Mon, 13 Dec 2010 23:12:46 +0800
Subject: [petsc-users] run direct linear solver in parallel
In-Reply-To: <AANLkTi=MZ+LK9gFpBJSfBzQfus1HYU2_eH=4NZZc3mO8@mail.gmail.com>
References: <AANLkTimMcz5mdKgGkwO4tbMoYabaE+CUEvpkbPo9W7_X@mail.gmail.com>
	<3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov>
	<AANLkTi=MZ+LK9gFpBJSfBzQfus1HYU2_eH=4NZZc3mO8@mail.gmail.com>
Message-ID: <AANLkTin1+cb_bmRBCONhK3v0sek7y9MzyrMvD4tTo5q0@mail.gmail.com>

Hi,

The error seems to be trapped by MPICH2's assertions. Is there some way to
propagate them to debuggers (gdb, whatever)?

Yep, I think I'll try SuperLU_dist again then.

Thanks for your advices!

Max

On Mon, Dec 13, 2010 at 11:00 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> Max,
> Does superlu_dist crash?
> Spooles has been out of support from its developers for more than 10 years.
> For small testing problems, it can be faster.
>
> Mumps is a good and robust direct solver we usually recommend, but it
> requires f90.
>
> Hong
>
> On Mon, Dec 13, 2010 at 8:34 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   The problem is not in PETSc.    Run in the debugger  and see exactly
> where this memcpy() overlap happens and if it can be fixed.
> >
> >  Barry
> >
> >
> > On Dec 13, 2010, at 4:30 AM, Max Ng wrote:
> >
> >> Hi,
> >>
> >> I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use
> SPOOLES because I need to build on Windows with VC++ (and without a Fortran
> compiler). And in my tests somehow SPOOLES performs better than SuperLU.
> >>
> >> My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I
> got this error:
> >>
> >> Assertion failed in file helper_fns.c at line 337: 0
> >> memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84
> len_=4
> >>
> >> internal ABORT - process 1
> >> Assertion failed in file helper_fns.c at line 337: 0
> >> memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018
> len_=4
> >>
> >> internal ABORT - process 0
> >> rank 1 in job 113  vm1_57881   caused collective abort of all ranks
> >>   exit status of rank 1: killed by signal 9
> >>
> >> Here is the source code:
> >>
> >>             // N = 40000, n = 20000, nnz = 9
> >>             //
> >>             MatCreate(comm, &mat);
> >>             MatSetType(mat, MATAIJ);
> >>             MatSetSizes(mat, n, n, N, N);
> >>             MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL);
> >>             MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz,
> PETSC_NULL);
> >>
> >>             // some code to fill the matrix values
> >>             // ...
> >>
> >>             KSPCreate(comm, &ksp);
> >>             KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN);
> >>             KSPSetType(ksp, KSPPREONLY);
> >>
> >>             KSPGetPC(ksp, &pc);
> >>             PCSetType(pc, PCLU);
> >>             PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES);
> >>
> >>             KSPSetUp(ksp);
> >>
> >> It crashes at the KSPSetUp() statement.
> >>
> >> Do you have any ideas? Thanks in advance!
> >>
> >> Max Ng
> >>
> >> On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote:
> >>
> >>> > Hi everyone,
> >>> >
> >>> > I am wondering how I can run the direct solver in parallel. I can run
> >>> > my program in a single processor with direct linear solver by
> >>> >
> >>> > ./foo.out  -ksp_type preonly -pc_type lu
> -pc_factor_mat_solver_package spooles
> >>> >
> >>> > However, when I try to run it with mpi:
> >>> >
> >>> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu
> >>> > -pc_factor_mat_solver_package spooles
> >>> >
> >>> > I got error like this:
> >>> >
> >>> > [0]PETSC ERROR: --------------------- Error Message
> >>> > ------------------------------------
> >>> > [0]PETSC ERROR: No support for this operation for this object type!
> >>> > [0]PETSC ERROR: Matrix type mpiaij  symbolic LU!
> >>> >
> >>> > [0]PETSC ERROR: Libraries linked from
> >>> > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt
> >>> > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008
> >>> > [0]PETSC ERROR: Configure options --with-shared --with-dynamic
> >>> > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi
> >>> > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack
> >>> > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
> >>> > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
> >>> > --with-superlu=1 --with-superlu-include=/usr/include/superlu
> >>> > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
> >>> > --with-spooles-include=/usr/include/spooles
> >>> > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
> >>> > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr
> >>> > [0]PETSC ERROR:
> >>> >
> ------------------------------------------------------------------------
> >>> > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in
> src/mat/interface/matrix.c
> >>> > [0]PETSC ERROR: PCSetUp_LU() line 257 in
> src/ksp/pc/impls/factor/lu/lu.c
> >>> > -------------------------------------------------------
> >>> >
> >>> > Would you like to tell me where I am doing wrong? I appreciate your
> help.
> >>> >
> >>> > Xiangdong
> >>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/468c1bfc/attachment.htm>

From m.skates82 at gmail.com  Mon Dec 13 11:10:25 2010
From: m.skates82 at gmail.com (Nunion)
Date: Mon, 13 Dec 2010 11:10:25 -0600
Subject: [petsc-users] Writing PETSc matrices
In-Reply-To: <22CBB298-B12C-4583-A516-5DCB74576EDF@mcs.anl.gov>
References: <AANLkTimwHkMydErw9mU9j26yj3czh85xXUUgqhVn1QMD@mail.gmail.com>
	<22CBB298-B12C-4583-A516-5DCB74576EDF@mcs.anl.gov>
Message-ID: <AANLkTi=uU7t9r9cK3SP+ujSznEhzqEyTfA9Z=eX=mc99@mail.gmail.com>

What files should one use to convert vectors from Matlab to PETSc
(PetscBinaryWrite is for square matrices)?  I have attempted to write
directly to binary from Matlab a matrix + vector, and only a vector (using
save command with the -mat option), then read the binary file into PETSc
(using ex34.c in ...src/mat/tests), however the format is not recognized.

Thanks,

Tom

On Tue, Oct 26, 2010 at 3:57 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Use PetscBinaryWrite('filename',sparsematlabmatrix)  I do not know why
> your second argument has quotes around it.
>
>   Barry
>
>
> On Oct 26, 2010, at 3:33 PM, Nunion wrote:
>
> > Hello,
> >
> > I am new to PETSc and programming.  I have a question concerning writing
> PETSc matrices in binary from binary matrices [compressessed/uncompressed]
> generated in Matlab.  I am attempting to use the files in the /bin/matlab
> directory, in particular the PetscBinaryWrite.m file.  However, the usage;
> >
> > PetscBinaryWrite('matrix.mat','output.ex') does not seem to work.  I also
> tried using the examples in the /mat directory however, matlab does not
> support the writing of complex matrices in ASCII.
> >
> > Thanks in advance,
> >
> > Tom
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/c604c70d/attachment-0001.htm>

From u.tabak at tudelft.nl  Mon Dec 13 11:20:40 2010
From: u.tabak at tudelft.nl (Umut Tabak)
Date: Mon, 13 Dec 2010 18:20:40 +0100
Subject: [petsc-users] Writing PETSc matrices
In-Reply-To: <AANLkTi=uU7t9r9cK3SP+ujSznEhzqEyTfA9Z=eX=mc99@mail.gmail.com>
References: <AANLkTimwHkMydErw9mU9j26yj3czh85xXUUgqhVn1QMD@mail.gmail.com>	<22CBB298-B12C-4583-A516-5DCB74576EDF@mcs.anl.gov>
	<AANLkTi=uU7t9r9cK3SP+ujSznEhzqEyTfA9Z=eX=mc99@mail.gmail.com>
Message-ID: <4D0655E8.2030800@tudelft.nl>

On 12/13/2010 06:10 PM, Nunion wrote:
> What files should one use to convert vectors from Matlab to PETSc 
> (PetscBinaryWrite is for square matrices)?  I have attempted to write 
> directly to binary from Matlab a matrix + vector, and only a vector 
> (using save command with the -mat option), then read the binary file 
> into PETSc (using ex34.c in ...src/mat/tests), however the format is 
> not recognized.
>
> Thanks,
> Tom
>
Trying to read mat files in Petsc? This is not possible AFAIK.
Use the provided Matlab interface to write objects in Petsc Binary 
format. You should be sure that the matrices are in sparse format. Also 
for vectors I guess, I am not sure you should check.
Then you can use these binary files generated in MATLAB in PETSc without 
problems.
HTH,
U.


From mhender at us.ibm.com  Mon Dec 13 11:56:15 2010
From: mhender at us.ibm.com (Michael E Henderson)
Date: Mon, 13 Dec 2010 12:56:15 -0500
Subject: [petsc-users] Solution output of TS
Message-ID: <OF9D643140.A73EAC94-ON852577F8.0061E546-852577F8.0062867A@us.ibm.com>

Hi,

I'm using TS and seeing a ":flickering" in the output. I use

  ierr=TSCreate(PETSC_COMM_WORLD,&timeStepper);
  ierr=TSSetProblemType(timeStepper,TS_NONLINEAR); 
  ierr=TSSetFromOptions(timeStepper); 
  ierr=TSSetIFunction(timeStepper,     formOperatorImplicitTimeFunction , 
(void*)Ldata);
  ierr=TSSetIJacobian(timeStepper,A,A, formOperatorImplicitTimeJacobian , 
(void*)Ldata);
  ierr=TSMonitorSet(timeStepper,SaveTSSolution,(void*)(&data),NULL); 

I'm using the implicit nonlinear formulation for an ADE, and writing the 
solution in the monitoring routine.

If I instead write out the solution in the IFunction routine when the 
residual is small I see a different (better) solution passed in.

Does this sound like a problem that's been seen before?

Thanks,

MIke Henderson
------------------------------------------------------------------------------------------------------------------------------------
Mathematical Sciences, TJ Watson Research Center
mhender at watson.ibm.com
http://www.research.ibm.com/people/h/henderson/
http://multifario.sourceforge.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/d6fbb8e4/attachment.htm>

From knepley at gmail.com  Mon Dec 13 12:01:48 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 13 Dec 2010 18:01:48 +0000
Subject: [petsc-users] class templates in C++ and Petsc functions
In-Reply-To: <4D0637AD.6060206@tudelft.nl>
References: <4D0637AD.6060206@tudelft.nl>
Message-ID: <AANLkTi=EWg_Vczdsy+_WAVVxMcY3iAU3=hfsFb4=JR2w@mail.gmail.com>

What is the error?

  Matt

On Mon, Dec 13, 2010 at 3:11 PM, Umut Tabak <u.tabak at tudelft.nl> wrote:

> Dear all,
>
> I was trying to write a simple class template code to interface Petsc
> matrices along with Boost sparse matrices and vice versa. However, my first
> naive try did not work out of the box, since the template class requires
> some functions from PETSc libraries and these functions should be located in
> the object files to be able to compile. Since class template is header only
> and there is no object code generation, it can not find the PETSc library
> functions which is logical. I am also new to templates in C++. I just gave a
> try to create a template class instead of code duplication however it did
> not end up as expected. Are there some smarter ways to accomplish this task.
> The function analyzes a boost csr matrix and sets the rows of a Petsc
> matrix. T is for boost matrices
> and T1 for Mat in Petsc. I can also get some comments on the code ;)
>
> Best regards,
> Umut
>
> template <class T, class T1>
> int converter<T, T1>::convertMBo2Pe( const T& A,
>                                                               T1 A_ ){
>    PetscErrorCode ierr;
>    int cntNnz = 0;
>    typedef typename T::iterator1 i1_t;
>    typedef typename T::iterator2 i2_t;
>    //int nnz[ooFelieMatrix.size1()];
>    int nnz[A.size1()];
>    unsigned ind=0;
>    //get information about the matrix
>
>    double* vals = NULL;
>    for (i1_t i1 = A.begin1(); i1 != A.end1(); ++i1)
>    {
>      nnz[ind] = distance(i1.begin(), i1.end());
>      ind++;
>    }
>    // create the matrix depending
>    // on the values of the nonzeros
>    // on each row
>    ierr = MatCreateSeqAIJ( PETSC_COMM_SELF, A.size1(),
>                                                  A.size2(), cntNnz, nnz,
>                                                  A_ );
>    PetscInt rInd = 0, cInd=0;
>    PetscInt* rCount, dummy;
>    rCount = &dummy;
>    // pointer to values in a row
>    PetscScalar*   valsOfRowI = NULL;
>    PetscInt*  colIndexOfRowI = NULL;
>    PetscInt rC = 1;
>    for(i1_t i1 = A.begin1(); i1 != A.end1(); ++i1)
>    {
>      // allocate space for the values of row I
>      valsOfRowI     = new PetscScalar[nnz[rInd]];
>      colIndexOfRowI = new PetscInt[nnz[rInd]];
>      for(i2_t i2 = i1.begin(); i2 != i1.end(); ++i2)
>      {
>         colIndexOfRowI[cInd] = i2.index2();
>         valsOfRowI[cInd]     = *i2;
>         cInd++;
>      }
>      // setting one row each time
>      *rCount = rInd;
>      MatSetValues( A_, rC, rCount, nnz[rInd],
>                                  colIndexOfRowI, valsOfRowI,
>                  INSERT_VALUES );
>      // delete
>      delete [] valsOfRowI;
>      delete [] colIndexOfRowI;
>      rInd++; cInd = 0;
>    }
>    //
>    MatAssemblyBegin( A_, MAT_FINAL_ASSEMBLY );
>    MatAssemblyEnd( A_, MAT_FINAL_ASSEMBLY );
>    // return
>    return 0;
> }
>
> --
>  - Hope is a good thing, maybe the best of things
>   and no good thing ever dies...
>  The Shawshank Redemption, replique of Tim Robbins
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/e89e7671/attachment.htm>

From bsmith at mcs.anl.gov  Mon Dec 13 21:01:37 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 13 Dec 2010 21:01:37 -0600
Subject: [petsc-users] MatMult
In-Reply-To: <1292228457.1803.34.camel@desktop>
References: <1292077938.2074.38.camel@desktop>
	<05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov>
	<1292225356.1803.4.camel@desktop>
	<AANLkTi=bFSj+_47J-BFyM+D=-Y2yib1Fdqh+zNfADeSM@mail.gmail.com>
	<1292228457.1803.34.camel@desktop>
Message-ID: <681B1F8F-BB5F-4EBF-831E-1227218ED3D0@mcs.anl.gov>


  Runs ok for me.

  Barry

On Dec 13, 2010, at 2:20 AM, Jakub Pola wrote:

> Could you please check the file attached to this email. there is source
> code and log summary from execution of mat mult.
> 
> When I run the ex131 with parameters -vec_type cuda and -mat_type
> seqaijcuda 
> 
> mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
> -vec_type cuda -log_summary
> 
> it fails because of CUDA Error 4. see MatMultKO.log
> 
> 
> When I run the same program without -vec_type cuda parameter only with
> -mat_type seqaijcuda it run ok.
> mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
> -log_summary
> 
> MatMltOK.log
> 
> When I run without -math_type seqaijcuda only with -vec_type cuda it
> fails again because
> 
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>  what():  invalid argument
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>  what():  invalid argument
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 3755 on node desktop exited
> on signal 6 (Aborted).
> --------------------------------------------------------------------------
> 
> 
> Could you please give me some comments on that
> 
> Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze:
>> Yes, it should run on the GPU. Check an example, like ex19.
>> 
>> 
>>   Matt
>> 
>> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com>
>> wrote:
>>        Hi,
>> 
>>        Does MatMult function is performed on GPU? when I prepared
>>        program which
>>        just executes this function with parameters -vec_type cuda and
>>        -mat_type
>>        seqaijcuda i havent seen in summary log any VecCUDACopyTo
>>        entry
>> 
>> 
>>        Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith
>>        pisze:
>> 
>> 
>>> To answer this you need to understand that PETSc copies
>>        vectors and matrices to the GPU memory "on demand" (that is
>>        exactly when they are first needed on the GPU, and not before)
>>        and once it has copied to the GPU it keeps track of it and
>>        will NOT copy it down again if it is already there.
>>> 
>>>   Hence in your run below, yes it includes the copy time
>>        down.
>>> 
>>>   But note that ONE multiply on the GPU is absurd, it does
>>        not make sense to copy a matrix down to the GPU and then do
>>        ONE multiply with it. Thus I NEVER do "sandalone" benchmarking
>>        where a single kernel is called by it self once, the time
>>        results are useless. Always run a FULL application with
>>        -log_summary; for example in this case a full KSPSolve() that
>>        requires a bunch of iterations. Then you can look at the
>>        performance of each kernel. The reason to do it this way is
>>        that the numbers can be very different and what matters is
>>        runs in APPLICATIONS so that is what should be measured.
>>> 
>>>   If say you run KSP with 20 iterations then the time to
>>        copy the matrix down to the GPU is amortized over those 20
>>        iterations and thus maybe ok. You should see the flop rate for
>>        the MatMult() go up in this case.
>>> 
>>>   You may have noticed we have a log entry for
>>        VecCopyToGPU() we will be adding one for matrices as well thus
>>        you will be able to see how long the copy time is but not that
>>        the copy time is still counted in the MatMult() time if the
>>        first copy of the matrix to GPU is triggered by the MatMult.
>>        You can subtract the copy time from the mult time to get the
>>        per multiply time, this would correspond to the multiply time
>>        in the limit of a single copy down and many, many multiplies
>>        on the GPU.
>>> 
>>>   Barry
>>> 
>>> 
>>> 
>>> 
>>> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
>>> 
>>>> Hello again,
>>>> 
>>>> I compiled one of te examples. I used sparse matix called
>>        02-raefsky3.
>>>> I used -vec_type cuda and -mat_type seqaijcuda.
>>>> 
>>>> When I see summary of the operations performed by program
>>        there is
>>>> 
>>>> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00
>>        0.0e+00  2100
>>>> 0  0  0   2100  0  0  0   147
>>>> 
>>>> Does time of performing MatMult includes memory transfer
>>        for loading
>>>> matrix in GPU memory or just exact computation time?
>>>> 
>>>> Thanks in advance.
>>>> Kuba.
>>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>> 
> 
> <tests.zip>


From mmnasr at gmail.com  Mon Dec 13 22:42:31 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Mon, 13 Dec 2010 20:42:31 -0800
Subject: [petsc-users] KSP solver and Distributed arrays
Message-ID: <AANLkTikhqxzb6QLJUCJ-g-D94Z1Ux6a55iNbvPrqsDLC@mail.gmail.com>

Hi guys,

A simple question.
Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix and
rhs, solution vectors with different number of ghost nodes (width)?
As an example, can it be possible to solve for
vector {x} DA, width=3, STAR stencil,
[A] DA, width=1, STAR stencil
{b} DA, width=1, STAR stencil

I am suspecting that it is not possible.

Thanks,
Mohamad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/bfd70e2c/attachment.htm>

From mmnasr at gmail.com  Tue Dec 14 04:15:21 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Tue, 14 Dec 2010 02:15:21 -0800
Subject: [petsc-users] Updating the ghost nodes for distributed arrays
Message-ID: <AANLkTimYPsX+mcTHGRY6Bry+V8MyCMwFSc-PKmHP71=s@mail.gmail.com>

Hi guys,

Is it possible to update the ghost values from a global to a local vector
for distributed arrays when global and local vectors are not from the same
DA, but the global vectors are the same?
This is the the code that I have, (the only difference between the two DA's
is the width. So, I am assuming that any global vector created based on
those are going to be the same)

G_data is created based on DA_3D, whereas L_data2 is created based on
DA_3D2.


        Vec G_data, L_data;
        Vec G_data2, L_data2;


ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ,
PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width, PETSC_NULL, PETSC_NULL,
PETSC_NULL, &DA_3D);
ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ,
PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width+2, PETSC_NULL,
PETSC_NULL, PETSC_NULL, &DA_3D2);

ierr = DACreateGlobalVector(DA_3D, &G_data); CHKERRQ(ierr);
ierr = DACreateLocalVector(DA_3D, &L_data); CHKERRQ(ierr);

ierr = DACreateGlobalVector(DA_3D2, &G_data2); CHKERRQ(ierr);
ierr = DACreateLocalVector(DA_3D2, &L_data2); CHKERRQ(ierr);

/* =====> Is this possible? */
ierr = DAGlobalToLocalBegin(DA_3D2, G_data, INSERT_VALUES,
L_data2);CHKERRQ(ierr);
ierr = DAGlobalToLocalEnd(DA_3D2, G_data, INSERT_VALUES,
L_data2);CHKERRQ(ierr);


Thanks,
Mohamad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101214/d5e7b926/attachment-0001.htm>

From bsmith at mcs.anl.gov  Tue Dec 14 08:11:09 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 14 Dec 2010 08:11:09 -0600
Subject: [petsc-users] KSP solver and Distributed arrays
In-Reply-To: <AANLkTikhqxzb6QLJUCJ-g-D94Z1Ux6a55iNbvPrqsDLC@mail.gmail.com>
References: <AANLkTikhqxzb6QLJUCJ-g-D94Z1Ux6a55iNbvPrqsDLC@mail.gmail.com>
Message-ID: <D7F3B56E-AEAE-4034-BCD0-D89068FA4066@mcs.anl.gov>


    The linear solvers only know about global dimensions and local sizes (they know nothing about local representations) so if the vectors and matrix are compatible the answer is yes.

   Barry

On Dec 13, 2010, at 10:42 PM, Mohamad M. Nasr-Azadani wrote:

> Hi guys, 
> 
> A simple question. 
> Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix and rhs, solution vectors with different number of ghost nodes (width)? 
> As an example, can it be possible to solve for 
> vector {x} DA, width=3, STAR stencil, 
> [A] DA, width=1, STAR stencil 
> {b} DA, width=1, STAR stencil
> 
> I am suspecting that it is not possible. 
> 
> Thanks, 
> Mohamad
> 


From bsmith at mcs.anl.gov  Tue Dec 14 08:12:16 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 14 Dec 2010 08:12:16 -0600
Subject: [petsc-users] Updating the ghost nodes for distributed arrays
In-Reply-To: <AANLkTimYPsX+mcTHGRY6Bry+V8MyCMwFSc-PKmHP71=s@mail.gmail.com>
References: <AANLkTimYPsX+mcTHGRY6Bry+V8MyCMwFSc-PKmHP71=s@mail.gmail.com>
Message-ID: <F795955D-3026-4C36-8A11-80769B3C542C@mcs.anl.gov>


On Dec 14, 2010, at 4:15 AM, Mohamad M. Nasr-Azadani wrote:

> Hi guys, 
> 
> Is it possible to update the ghost values from a global to a local vector for distributed arrays when global and local vectors are not from the same DA, but the global vectors are the same?

Yes

> This is the the code that I have, (the only difference between the two DA's is the width. So, I am assuming that any global vector created based on those are going to be the same)

Yes


> 
> G_data is created based on DA_3D, whereas L_data2 is created based on DA_3D2. 
> 
> 
>         Vec G_data, L_data; 
>         Vec G_data2, L_data2; 
>         
> 
> 	ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width, PETSC_NULL, PETSC_NULL, PETSC_NULL, &DA_3D);
> 	ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width+2, PETSC_NULL, PETSC_NULL, PETSC_NULL, &DA_3D2);
> 
> 	ierr = DACreateGlobalVector(DA_3D, &G_data); CHKERRQ(ierr);
> 	ierr = DACreateLocalVector(DA_3D, &L_data); CHKERRQ(ierr);
> 
> 	ierr = DACreateGlobalVector(DA_3D2, &G_data2); CHKERRQ(ierr);
> 	ierr = DACreateLocalVector(DA_3D2, &L_data2); CHKERRQ(ierr);
> 
> /* =====> Is this possible? */
> 	ierr = DAGlobalToLocalBegin(DA_3D2, G_data, INSERT_VALUES, L_data2);CHKERRQ(ierr);
> 	ierr = DAGlobalToLocalEnd(DA_3D2, G_data, INSERT_VALUES, L_data2);CHKERRQ(ierr);
> 
> 
> Thanks, 
> Mohamad
> 


From knepley at gmail.com  Tue Dec 14 09:14:04 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 14 Dec 2010 07:14:04 -0800
Subject: [petsc-users] KSP solver and Distributed arrays
In-Reply-To: <AANLkTikhqxzb6QLJUCJ-g-D94Z1Ux6a55iNbvPrqsDLC@mail.gmail.com>
References: <AANLkTikhqxzb6QLJUCJ-g-D94Z1Ux6a55iNbvPrqsDLC@mail.gmail.com>
Message-ID: <AANLkTi=uky=ncSjYJtzp+py3RwSAZ9oAQYZtj5mt48OF@mail.gmail.com>

Yes that is possible. The solver does not know about ghost nodes.

   Matt

On Mon, Dec 13, 2010 at 8:42 PM, Mohamad M. Nasr-Azadani
<mmnasr at gmail.com>wrote:

> Hi guys,
>
> A simple question.
> Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix and
> rhs, solution vectors with different number of ghost nodes (width)?
> As an example, can it be possible to solve for
> vector {x} DA, width=3, STAR stencil,
> [A] DA, width=1, STAR stencil
> {b} DA, width=1, STAR stencil
>
> I am suspecting that it is not possible.
>
> Thanks,
> Mohamad
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101214/504da341/attachment.htm>

From ecoon at lanl.gov  Wed Dec 15 13:09:59 2010
From: ecoon at lanl.gov (Ethan Coon)
Date: Wed, 15 Dec 2010 12:09:59 -0700
Subject: [petsc-users] IS from DA by coordinates
Message-ID: <1292440199.15255.38.camel@hahn.lanl.gov>

Hi all,

Is there a cleaner way to create an IS to a global vector on a DA for a
subset of nodes using a coordinate value than the following?  (Pardon my
pseudo-code combo of python and c and imprecise arguments)  

// get the coordinates of the nodes
DAGetCoordinateDA(da, dac)
DAGetCoordinates(da, vecc)
DAVecGetArray(dac, vecc, vecc_a)

// generate a one-dof da with no ghosts and the same parallel 
// structure as the da
DAGetOwnershipRanges(da, lx[], ly[], lz[])
DACreate3D(comm, M,N,P,len(lx),len(ly),len(lz),1,0, lx[], ly[], \
  lz[], da_one)

// get the global indices of the one-dof da, noting that because
// we set the stencil size to zero, the local array of global 
// indices is the same size as the local portion of the global array 
// of coordinates
DAGetCorners(xs, ys, zs, xl, yl, zl)
DAGetGlobalIndices(da_one, xl*yl*zl, indices[])

// loop over the local array returned by 
// DAGetGlobalIndices and the coordinates, 
// comparing to the test
global_block_indices = []
for i in range(xs, xs+xl):
  for j in range(ys, ys+yl):
    for k in range(zs, zs+zl):
      if vecc_a[k,j,i,:] == whatever_coordinate:
        global_block_indices.append(indices[k,j,i])


// make the IS
ISCreateBlock(comm, ndofs, global_block_indices, coord_is)

// restore and destroy etc.

This just seems quite complicated with the construction of the one-dof
da to get global indices of the block.  There might be a better way with
ISLocalToGlobalMapping, but I wasn't sure how.  Any suggestions?

Thanks,

Ethan


-- 
-------------------------------------
Ethan Coon
Post-Doctoral Researcher
Mathematical Modeling and Analysis
Los Alamos National Laboratory
505-665-8289

http://www.ldeo.columbia.edu/~ecoon/
-------------------------------------


From bsmith at mcs.anl.gov  Wed Dec 15 13:26:18 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 15 Dec 2010 13:26:18 -0600
Subject: [petsc-users] IS from DA by coordinates
In-Reply-To: <1292440199.15255.38.camel@hahn.lanl.gov>
References: <1292440199.15255.38.camel@hahn.lanl.gov>
Message-ID: <9367973E-331F-4608-8CCE-BF23BAA9F30B@mcs.anl.gov>


  Ethan,

  I don't think there is any reason to create a new DA or call DAGetGlobalIndices().. Just call DAGetLocalToGlobalMappingBlock() on the original DA  Then 

for i in range(xs, xs+xl):
 for j in range(ys, ys+yl):
   for k in range(zs, zs+zl):
     if vecc_a[k,j,i,:] == whatever_coordinate:
      local_indices.append( convert i,j,k, to local numbering with something like (k-zs)*mx*my + (j-ys)*mx + ..
...

Now apply ISLocalToGlobalMappingApply to local_indices and you have a list of global indices depending on what you want you do with this beast you may need to scale by bs or 1/bs

   Barry


On Dec 15, 2010, at 1:09 PM, Ethan Coon wrote:

> Hi all,
> 
> Is there a cleaner way to create an IS to a global vector on a DA for a
> subset of nodes using a coordinate value than the following?  (Pardon my
> pseudo-code combo of python and c and imprecise arguments)  
> 
> // get the coordinates of the nodes
> DAGetCoordinateDA(da, dac)
> DAGetCoordinates(da, vecc)
> DAVecGetArray(dac, vecc, vecc_a)
> 
> // generate a one-dof da with no ghosts and the same parallel 
> // structure as the da
> DAGetOwnershipRanges(da, lx[], ly[], lz[])
> DACreate3D(comm, M,N,P,len(lx),len(ly),len(lz),1,0, lx[], ly[], \
>  lz[], da_one)
> 
> // get the global indices of the one-dof da, noting that because
> // we set the stencil size to zero, the local array of global 
> // indices is the same size as the local portion of the global array 
> // of coordinates
> DAGetCorners(xs, ys, zs, xl, yl, zl)
> DAGetGlobalIndices(da_one, xl*yl*zl, indices[])
> 
> // loop over the local array returned by 
> // DAGetGlobalIndices and the coordinates, 
> // comparing to the test
> global_block_indices = []
> for i in range(xs, xs+xl):
>  for j in range(ys, ys+yl):
>    for k in range(zs, zs+zl):
>      if vecc_a[k,j,i,:] == whatever_coordinate:
>        global_block_indices.append(indices[k,j,i])
> 
> 
> // make the IS
> ISCreateBlock(comm, ndofs, global_block_indices, coord_is)
> 
> // restore and destroy etc.
> 
> This just seems quite complicated with the construction of the one-dof
> da to get global indices of the block.  There might be a better way with
> ISLocalToGlobalMapping, but I wasn't sure how.  Any suggestions?
> 
> Thanks,
> 
> Ethan
> 
> 
> 
> 
> -- 
> -------------------------------------
> Ethan Coon
> Post-Doctoral Researcher
> Mathematical Modeling and Analysis
> Los Alamos National Laboratory
> 505-665-8289
> 
> http://www.ldeo.columbia.edu/~ecoon/
> -------------------------------------
> 


From ecoon at lanl.gov  Wed Dec 15 15:04:29 2010
From: ecoon at lanl.gov (Ethan Coon)
Date: Wed, 15 Dec 2010 14:04:29 -0700
Subject: [petsc-users] IS from DA by coordinates
In-Reply-To: <9367973E-331F-4608-8CCE-BF23BAA9F30B@mcs.anl.gov>
References: <1292440199.15255.38.camel@hahn.lanl.gov>
	<9367973E-331F-4608-8CCE-BF23BAA9F30B@mcs.anl.gov>
Message-ID: <1292447069.15255.97.camel@hahn.lanl.gov>

On Wed, 2010-12-15 at 13:26 -0600, Barry Smith wrote:
> Ethan,
> 
>   I don't think there is any reason to create a new DA or call DAGetGlobalIndices().. Just call DAGetLocalToGlobalMappingBlock() on the original DA  Then 
> 
> for i in range(xs, xs+xl):
>  for j in range(ys, ys+yl):
>    for k in range(zs, zs+zl):
>      if vecc_a[k,j,i,:] == whatever_coordinate:
>       local_indices.append( convert i,j,k, to local numbering with something like (k-zs)*mx*my + (j-ys)*mx + ..
> ...
> 
> Now apply ISLocalToGlobalMappingApply 

Great, this is what I was missing.  This should do the trick.  Thanks
Barry.

Ethan


> to local_indices and you have a list of global indices depending on what you want you do with this beast you may need to scale by bs or 1/bs
> 
>    Barry
> 
> 
> On Dec 15, 2010, at 1:09 PM, Ethan Coon wrote:
> 
> > Hi all,
> > 
> > Is there a cleaner way to create an IS to a global vector on a DA for a
> > subset of nodes using a coordinate value than the following?  (Pardon my
> > pseudo-code combo of python and c and imprecise arguments)  
> > 
> > // get the coordinates of the nodes
> > DAGetCoordinateDA(da, dac)
> > DAGetCoordinates(da, vecc)
> > DAVecGetArray(dac, vecc, vecc_a)
> > 
> > // generate a one-dof da with no ghosts and the same parallel 
> > // structure as the da
> > DAGetOwnershipRanges(da, lx[], ly[], lz[])
> > DACreate3D(comm, M,N,P,len(lx),len(ly),len(lz),1,0, lx[], ly[], \
> >  lz[], da_one)
> > 
> > // get the global indices of the one-dof da, noting that because
> > // we set the stencil size to zero, the local array of global 
> > // indices is the same size as the local portion of the global array 
> > // of coordinates
> > DAGetCorners(xs, ys, zs, xl, yl, zl)
> > DAGetGlobalIndices(da_one, xl*yl*zl, indices[])
> > 
> > // loop over the local array returned by 
> > // DAGetGlobalIndices and the coordinates, 
> > // comparing to the test
> > global_block_indices = []
> > for i in range(xs, xs+xl):
> >  for j in range(ys, ys+yl):
> >    for k in range(zs, zs+zl):
> >      if vecc_a[k,j,i,:] == whatever_coordinate:
> >        global_block_indices.append(indices[k,j,i])
> > 
> > 
> > // make the IS
> > ISCreateBlock(comm, ndofs, global_block_indices, coord_is)
> > 
> > // restore and destroy etc.
> > 
> > This just seems quite complicated with the construction of the one-dof
> > da to get global indices of the block.  There might be a better way with
> > ISLocalToGlobalMapping, but I wasn't sure how.  Any suggestions?
> > 
> > Thanks,
> > 
> > Ethan
> > 
> > 
> > 
> > 
> > -- 
> > -------------------------------------
> > Ethan Coon
> > Post-Doctoral Researcher
> > Mathematical Modeling and Analysis
> > Los Alamos National Laboratory
> > 505-665-8289
> > 
> > http://www.ldeo.columbia.edu/~ecoon/
> > -------------------------------------
> > 
> 

-- 
-------------------------------------
Ethan Coon
Post-Doctoral Researcher
Mathematical Modeling and Analysis
Los Alamos National Laboratory
505-665-8289

http://www.ldeo.columbia.edu/~ecoon/
-------------------------------------


From vijay.m at gmail.com  Wed Dec 15 18:06:53 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 15 Dec 2010 18:06:53 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
Message-ID: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>

Hi,

I have an implementation issue with the MatRestrict/Interpolate
functions. The problem is that one of my coarser levels (with PCMG)
has higher dofs than the finest level. This does not always happen and
requires a weird fine mesh system (in a sense) that uses multi-grid,
but the idea is that the finest level problem has a high order (HO)
discretization while the lower level mesh has a linear tesselation of
the finest HO level (which I can optimize) and then adaptively
coarsened levels beyond that. Since the number of columns in this case
is larger than the number of rows, MatRestrict invariably calls
MatMultTranspose to multiply instead of MatMult and vice-versa while
calling  MatInterpolate. These result in assertion errors while
comparing the length of Mat and Vec. The chosen method is based on
whether (M>N) which seems to act against what I am doing here...

I can always implement a shell matrix to replicate
Restrict/Interpolate actions but my question is whether if such
discretization will yield a consistent convergence in MG algorithm ?
Is there a strong reason for checking if (M>N) rather than just doing
(mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
appreciate any detailed answer that you can provide for this and any
suggestions to use the existing methods (without implementing the
shell restriction) is very welcome.

Thanks,
vijay

From mmnasr at gmail.com  Wed Dec 15 19:52:54 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Wed, 15 Dec 2010 17:52:54 -0800
Subject: [petsc-users] Updating the ghost nodes for distributed arrays
In-Reply-To: <F795955D-3026-4C36-8A11-80769B3C542C@mcs.anl.gov>
References: <AANLkTimYPsX+mcTHGRY6Bry+V8MyCMwFSc-PKmHP71=s@mail.gmail.com>
	<F795955D-3026-4C36-8A11-80769B3C542C@mcs.anl.gov>
Message-ID: <AANLkTimvxBR+SuQri5ETz-3afBvrr2oX04CZ-PkpWVmi@mail.gmail.com>

Thanks Barry for your help.
M

On Tue, Dec 14, 2010 at 6:12 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Dec 14, 2010, at 4:15 AM, Mohamad M. Nasr-Azadani wrote:
>
> > Hi guys,
> >
> > Is it possible to update the ghost values from a global to a local vector
> for distributed arrays when global and local vectors are not from the same
> DA, but the global vectors are the same?
>
> Yes
>
> > This is the the code that I have, (the only difference between the two
> DA's is the width. So, I am assuming that any global vector created based on
> those are going to be the same)
>
> Yes
>
>
> >
> > G_data is created based on DA_3D, whereas L_data2 is created based on
> DA_3D2.
> >
> >
> >         Vec G_data, L_data;
> >         Vec G_data2, L_data2;
> >
> >
> >       ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ,
> PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width, PETSC_NULL, PETSC_NULL,
> PETSC_NULL, &DA_3D);
> >       ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ,
> PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width+2, PETSC_NULL,
> PETSC_NULL, PETSC_NULL, &DA_3D2);
> >
> >       ierr = DACreateGlobalVector(DA_3D, &G_data); CHKERRQ(ierr);
> >       ierr = DACreateLocalVector(DA_3D, &L_data); CHKERRQ(ierr);
> >
> >       ierr = DACreateGlobalVector(DA_3D2, &G_data2); CHKERRQ(ierr);
> >       ierr = DACreateLocalVector(DA_3D2, &L_data2); CHKERRQ(ierr);
> >
> > /* =====> Is this possible? */
> >       ierr = DAGlobalToLocalBegin(DA_3D2, G_data, INSERT_VALUES,
> L_data2);CHKERRQ(ierr);
> >       ierr = DAGlobalToLocalEnd(DA_3D2, G_data, INSERT_VALUES,
> L_data2);CHKERRQ(ierr);
> >
> >
> > Thanks,
> > Mohamad
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101215/4b9b0d10/attachment-0001.htm>

From bsmith at mcs.anl.gov  Wed Dec 15 19:53:05 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 15 Dec 2010 19:53:05 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
Message-ID: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>


  Vijay,

    The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.

   I'll try to get it down in the next few hours but it may take a little longer.


   Barry

On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:

> Hi,
> 
> I have an implementation issue with the MatRestrict/Interpolate
> functions. The problem is that one of my coarser levels (with PCMG)
> has higher dofs than the finest level. This does not always happen and
> requires a weird fine mesh system (in a sense) that uses multi-grid,
> but the idea is that the finest level problem has a high order (HO)
> discretization while the lower level mesh has a linear tesselation of
> the finest HO level (which I can optimize) and then adaptively
> coarsened levels beyond that. Since the number of columns in this case
> is larger than the number of rows, MatRestrict invariably calls
> MatMultTranspose to multiply instead of MatMult and vice-versa while
> calling  MatInterpolate. These result in assertion errors while
> comparing the length of Mat and Vec. The chosen method is based on
> whether (M>N) which seems to act against what I am doing here...
> 
> I can always implement a shell matrix to replicate
> Restrict/Interpolate actions but my question is whether if such
> discretization will yield a consistent convergence in MG algorithm ?
> Is there a strong reason for checking if (M>N) rather than just doing
> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
> appreciate any detailed answer that you can provide for this and any
> suggestions to use the existing methods (without implementing the
> shell restriction) is very welcome.
> 
> Thanks,
> vijay


From mmnasr at gmail.com  Wed Dec 15 19:53:20 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Wed, 15 Dec 2010 17:53:20 -0800
Subject: [petsc-users] KSP solver and Distributed arrays
In-Reply-To: <AANLkTi=uky=ncSjYJtzp+py3RwSAZ9oAQYZtj5mt48OF@mail.gmail.com>
References: <AANLkTikhqxzb6QLJUCJ-g-D94Z1Ux6a55iNbvPrqsDLC@mail.gmail.com>
	<AANLkTi=uky=ncSjYJtzp+py3RwSAZ9oAQYZtj5mt48OF@mail.gmail.com>
Message-ID: <AANLkTik8LV-BPzxicBbHzy2i4_36w0Cb84Z7PzkrjMNb@mail.gmail.com>

Thank you Barry and Matthew.

Mohamad


On Tue, Dec 14, 2010 at 7:14 AM, Matthew Knepley <knepley at gmail.com> wrote:

> Yes that is possible. The solver does not know about ghost nodes.
>
>    Matt
>
>
> On Mon, Dec 13, 2010 at 8:42 PM, Mohamad M. Nasr-Azadani <mmnasr at gmail.com
> > wrote:
>
>> Hi guys,
>>
>> A simple question.
>> Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix
>> and rhs, solution vectors with different number of ghost nodes (width)?
>> As an example, can it be possible to solve for
>> vector {x} DA, width=3, STAR stencil,
>> [A] DA, width=1, STAR stencil
>> {b} DA, width=1, STAR stencil
>>
>> I am suspecting that it is not possible.
>>
>> Thanks,
>> Mohamad
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101215/134e5d98/attachment.htm>

From bsmith at mcs.anl.gov  Wed Dec 15 20:04:08 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 15 Dec 2010 20:04:08 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
	<692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
Message-ID: <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>


  I have pushed this change to petsc-dev and it is ready for use.

   Barry

  Note it can still glitch if the restricted size is exactly the original size. :-(


On Dec 15, 2010, at 7:53 PM, Barry Smith wrote:

> 
>  Vijay,
> 
>    The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.
> 
>   I'll try to get it down in the next few hours but it may take a little longer.
> 
> 
>   Barry
> 
> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:
> 
>> Hi,
>> 
>> I have an implementation issue with the MatRestrict/Interpolate
>> functions. The problem is that one of my coarser levels (with PCMG)
>> has higher dofs than the finest level. This does not always happen and
>> requires a weird fine mesh system (in a sense) that uses multi-grid,
>> but the idea is that the finest level problem has a high order (HO)
>> discretization while the lower level mesh has a linear tesselation of
>> the finest HO level (which I can optimize) and then adaptively
>> coarsened levels beyond that. Since the number of columns in this case
>> is larger than the number of rows, MatRestrict invariably calls
>> MatMultTranspose to multiply instead of MatMult and vice-versa while
>> calling  MatInterpolate. These result in assertion errors while
>> comparing the length of Mat and Vec. The chosen method is based on
>> whether (M>N) which seems to act against what I am doing here...
>> 
>> I can always implement a shell matrix to replicate
>> Restrict/Interpolate actions but my question is whether if such
>> discretization will yield a consistent convergence in MG algorithm ?
>> Is there a strong reason for checking if (M>N) rather than just doing
>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
>> appreciate any detailed answer that you can provide for this and any
>> suggestions to use the existing methods (without implementing the
>> shell restriction) is very welcome.
>> 
>> Thanks,
>> vijay
> 


From vijay.m at gmail.com  Wed Dec 15 20:16:37 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 15 Dec 2010 20:16:37 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
	<692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
	<0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>
Message-ID: <AANLkTinwZZTtUTYB5rtJ8JrVG3o0F9sgGLhTVW2HQ8S6@mail.gmail.com>

Barry,

Thanks for the prompt change ! I do not work on the development
version but I can update these matrix routines alone.

> ?Note it can still glitch if the restricted size is exactly the original size. :-(

Why would it glitch if the restricted size is the same as the original
size though ? I dont see a case where your check (M==Ny) would fail.
Can you please elaborate more on this ?

Vijay

On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?I have pushed this change to petsc-dev and it is ready for use.
>
> ? Barry
>
> ?Note it can still glitch if the restricted size is exactly the original size. :-(
>
>
> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote:
>
>>
>> ?Vijay,
>>
>> ? ?The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.
>>
>> ? I'll try to get it down in the next few hours but it may take a little longer.
>>
>>
>> ? Barry
>>
>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:
>>
>>> Hi,
>>>
>>> I have an implementation issue with the MatRestrict/Interpolate
>>> functions. The problem is that one of my coarser levels (with PCMG)
>>> has higher dofs than the finest level. This does not always happen and
>>> requires a weird fine mesh system (in a sense) that uses multi-grid,
>>> but the idea is that the finest level problem has a high order (HO)
>>> discretization while the lower level mesh has a linear tesselation of
>>> the finest HO level (which I can optimize) and then adaptively
>>> coarsened levels beyond that. Since the number of columns in this case
>>> is larger than the number of rows, MatRestrict invariably calls
>>> MatMultTranspose to multiply instead of MatMult and vice-versa while
>>> calling ?MatInterpolate. These result in assertion errors while
>>> comparing the length of Mat and Vec. The chosen method is based on
>>> whether (M>N) which seems to act against what I am doing here...
>>>
>>> I can always implement a shell matrix to replicate
>>> Restrict/Interpolate actions but my question is whether if such
>>> discretization will yield a consistent convergence in MG algorithm ?
>>> Is there a strong reason for checking if (M>N) rather than just doing
>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
>>> appreciate any detailed answer that you can provide for this and any
>>> suggestions to use the existing methods (without implementing the
>>> shell restriction) is very welcome.
>>>
>>> Thanks,
>>> vijay
>>
>
>

From bsmith at mcs.anl.gov  Wed Dec 15 20:36:53 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 15 Dec 2010 20:36:53 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <AANLkTinwZZTtUTYB5rtJ8JrVG3o0F9sgGLhTVW2HQ8S6@mail.gmail.com>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
	<692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
	<0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>
	<AANLkTinwZZTtUTYB5rtJ8JrVG3o0F9sgGLhTVW2HQ8S6@mail.gmail.com>
Message-ID: <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov>


On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote:

> Barry,
> 
> Thanks for the prompt change ! I do not work on the development
> version but I can update these matrix routines alone.
> 
>>  Note it can still glitch if the restricted size is exactly the original size. :-(
> 
> Why would it glitch if the restricted size is the same as the original
> size though ? I dont see a case where your check (M==Ny) would fail.
> Can you please elaborate more on this ?

  Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.

  Barry

> 
> Vijay
> 
> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  I have pushed this change to petsc-dev and it is ready for use.
>> 
>>   Barry
>> 
>>  Note it can still glitch if the restricted size is exactly the original size. :-(
>> 
>> 
>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote:
>> 
>>> 
>>>  Vijay,
>>> 
>>>    The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.
>>> 
>>>   I'll try to get it down in the next few hours but it may take a little longer.
>>> 
>>> 
>>>   Barry
>>> 
>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have an implementation issue with the MatRestrict/Interpolate
>>>> functions. The problem is that one of my coarser levels (with PCMG)
>>>> has higher dofs than the finest level. This does not always happen and
>>>> requires a weird fine mesh system (in a sense) that uses multi-grid,
>>>> but the idea is that the finest level problem has a high order (HO)
>>>> discretization while the lower level mesh has a linear tesselation of
>>>> the finest HO level (which I can optimize) and then adaptively
>>>> coarsened levels beyond that. Since the number of columns in this case
>>>> is larger than the number of rows, MatRestrict invariably calls
>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while
>>>> calling  MatInterpolate. These result in assertion errors while
>>>> comparing the length of Mat and Vec. The chosen method is based on
>>>> whether (M>N) which seems to act against what I am doing here...
>>>> 
>>>> I can always implement a shell matrix to replicate
>>>> Restrict/Interpolate actions but my question is whether if such
>>>> discretization will yield a consistent convergence in MG algorithm ?
>>>> Is there a strong reason for checking if (M>N) rather than just doing
>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
>>>> appreciate any detailed answer that you can provide for this and any
>>>> suggestions to use the existing methods (without implementing the
>>>> shell restriction) is very welcome.
>>>> 
>>>> Thanks,
>>>> vijay
>>> 
>> 
>> 


From vijay.m at gmail.com  Wed Dec 15 21:12:21 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 15 Dec 2010 21:12:21 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
	<692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
	<0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>
	<AANLkTinwZZTtUTYB5rtJ8JrVG3o0F9sgGLhTVW2HQ8S6@mail.gmail.com>
	<49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov>
Message-ID: <AANLkTin4zkXCL_n8+h9wdhqEZEOg3CE7+OrtpG-UpcTN@mail.gmail.com>

> ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.

hmm, in the case when I provide a matrix for both
Interpolate/Restrict, I have (M==N==Nx==Ny),  this would always call
the MatMult routine. As long as I provide both these
operators/matrices explicitly, there is still no problem. A possible
issue is only when someone provides just the restriction or
prolongation. But I understand that when this happens, the other
operator is computed explicitly as its transpose. If this is actual
implementation, I still dont see a problem. Although, if the
restriction/prolongation operator are implicitly assumed to be
transpose of the other, then it will quite horribly fail since only
MatMult is called for both. I am not completely sure about the mode
currently used in petsc but it would be great if you can help me
understand.

Note: I probe more on this since my linear tesselation (most often)
results in the same number of dofs on the coarser level
(p-coarsened/h-refined) and I dont want a glitch to come back and bite
me later on..

Vijay

On Wed, Dec 15, 2010 at 8:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote:
>
>> Barry,
>>
>> Thanks for the prompt change ! I do not work on the development
>> version but I can update these matrix routines alone.
>>
>>> ?Note it can still glitch if the restricted size is exactly the original size. :-(
>>
>> Why would it glitch if the restricted size is the same as the original
>> size though ? I dont see a case where your check (M==Ny) would fail.
>> Can you please elaborate more on this ?
>
> ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.
>
> ?Barry
>
>>
>> Vijay
>>
>> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> ?I have pushed this change to petsc-dev and it is ready for use.
>>>
>>> ? Barry
>>>
>>> ?Note it can still glitch if the restricted size is exactly the original size. :-(
>>>
>>>
>>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote:
>>>
>>>>
>>>> ?Vijay,
>>>>
>>>> ? ?The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.
>>>>
>>>> ? I'll try to get it down in the next few hours but it may take a little longer.
>>>>
>>>>
>>>> ? Barry
>>>>
>>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have an implementation issue with the MatRestrict/Interpolate
>>>>> functions. The problem is that one of my coarser levels (with PCMG)
>>>>> has higher dofs than the finest level. This does not always happen and
>>>>> requires a weird fine mesh system (in a sense) that uses multi-grid,
>>>>> but the idea is that the finest level problem has a high order (HO)
>>>>> discretization while the lower level mesh has a linear tesselation of
>>>>> the finest HO level (which I can optimize) and then adaptively
>>>>> coarsened levels beyond that. Since the number of columns in this case
>>>>> is larger than the number of rows, MatRestrict invariably calls
>>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while
>>>>> calling ?MatInterpolate. These result in assertion errors while
>>>>> comparing the length of Mat and Vec. The chosen method is based on
>>>>> whether (M>N) which seems to act against what I am doing here...
>>>>>
>>>>> I can always implement a shell matrix to replicate
>>>>> Restrict/Interpolate actions but my question is whether if such
>>>>> discretization will yield a consistent convergence in MG algorithm ?
>>>>> Is there a strong reason for checking if (M>N) rather than just doing
>>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
>>>>> appreciate any detailed answer that you can provide for this and any
>>>>> suggestions to use the existing methods (without implementing the
>>>>> shell restriction) is very welcome.
>>>>>
>>>>> Thanks,
>>>>> vijay
>>>>
>>>
>>>
>
>

From bsmith at mcs.anl.gov  Wed Dec 15 21:16:28 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 15 Dec 2010 21:16:28 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <AANLkTin4zkXCL_n8+h9wdhqEZEOg3CE7+OrtpG-UpcTN@mail.gmail.com>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
	<692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
	<0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>
	<AANLkTinwZZTtUTYB5rtJ8JrVG3o0F9sgGLhTVW2HQ8S6@mail.gmail.com>
	<49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov>
	<AANLkTin4zkXCL_n8+h9wdhqEZEOg3CE7+OrtpG-UpcTN@mail.gmail.com>
Message-ID: <95AB2BDE-871C-4F3A-B096-5E26030D7080@mcs.anl.gov>


  If you explicitly provide both then you are ok. If you provide one and not the other it will fail silently with a bad algorithm.


  Barry

On Dec 15, 2010, at 9:12 PM, Vijay S. Mahadevan wrote:

>>  Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.
> 
> hmm, in the case when I provide a matrix for both
> Interpolate/Restrict, I have (M==N==Nx==Ny),  this would always call
> the MatMult routine. As long as I provide both these
> operators/matrices explicitly, there is still no problem. A possible
> issue is only when someone provides just the restriction or
> prolongation. But I understand that when this happens, the other
> operator is computed explicitly as its transpose. If this is actual
> implementation, I still dont see a problem. Although, if the
> restriction/prolongation operator are implicitly assumed to be
> transpose of the other, then it will quite horribly fail since only
> MatMult is called for both. I am not completely sure about the mode
> currently used in petsc but it would be great if you can help me
> understand.
> 
> Note: I probe more on this since my linear tesselation (most often)
> results in the same number of dofs on the coarser level
> (p-coarsened/h-refined) and I dont want a glitch to come back and bite
> me later on..
> 
> Vijay
> 
> On Wed, Dec 15, 2010 at 8:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote:
>> 
>>> Barry,
>>> 
>>> Thanks for the prompt change ! I do not work on the development
>>> version but I can update these matrix routines alone.
>>> 
>>>>  Note it can still glitch if the restricted size is exactly the original size. :-(
>>> 
>>> Why would it glitch if the restricted size is the same as the original
>>> size though ? I dont see a case where your check (M==Ny) would fail.
>>> Can you please elaborate more on this ?
>> 
>>  Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.
>> 
>>  Barry
>> 
>>> 
>>> Vijay
>>> 
>>> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>  I have pushed this change to petsc-dev and it is ready for use.
>>>> 
>>>>   Barry
>>>> 
>>>>  Note it can still glitch if the restricted size is exactly the original size. :-(
>>>> 
>>>> 
>>>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote:
>>>> 
>>>>> 
>>>>>  Vijay,
>>>>> 
>>>>>    The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.
>>>>> 
>>>>>   I'll try to get it down in the next few hours but it may take a little longer.
>>>>> 
>>>>> 
>>>>>   Barry
>>>>> 
>>>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I have an implementation issue with the MatRestrict/Interpolate
>>>>>> functions. The problem is that one of my coarser levels (with PCMG)
>>>>>> has higher dofs than the finest level. This does not always happen and
>>>>>> requires a weird fine mesh system (in a sense) that uses multi-grid,
>>>>>> but the idea is that the finest level problem has a high order (HO)
>>>>>> discretization while the lower level mesh has a linear tesselation of
>>>>>> the finest HO level (which I can optimize) and then adaptively
>>>>>> coarsened levels beyond that. Since the number of columns in this case
>>>>>> is larger than the number of rows, MatRestrict invariably calls
>>>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while
>>>>>> calling  MatInterpolate. These result in assertion errors while
>>>>>> comparing the length of Mat and Vec. The chosen method is based on
>>>>>> whether (M>N) which seems to act against what I am doing here...
>>>>>> 
>>>>>> I can always implement a shell matrix to replicate
>>>>>> Restrict/Interpolate actions but my question is whether if such
>>>>>> discretization will yield a consistent convergence in MG algorithm ?
>>>>>> Is there a strong reason for checking if (M>N) rather than just doing
>>>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
>>>>>> appreciate any detailed answer that you can provide for this and any
>>>>>> suggestions to use the existing methods (without implementing the
>>>>>> shell restriction) is very welcome.
>>>>>> 
>>>>>> Thanks,
>>>>>> vijay
>>>>> 
>>>> 
>>>> 
>> 
>> 


From vijay.m at gmail.com  Thu Dec 16 09:00:48 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 16 Dec 2010 09:00:48 -0600
Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG.
In-Reply-To: <95AB2BDE-871C-4F3A-B096-5E26030D7080@mcs.anl.gov>
References: <AANLkTi=tmuzfFOtqtwgaJN3n30onwt1ykUx_uhUm_U3p@mail.gmail.com>
	<692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov>
	<0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov>
	<AANLkTinwZZTtUTYB5rtJ8JrVG3o0F9sgGLhTVW2HQ8S6@mail.gmail.com>
	<49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov>
	<AANLkTin4zkXCL_n8+h9wdhqEZEOg3CE7+OrtpG-UpcTN@mail.gmail.com>
	<95AB2BDE-871C-4F3A-B096-5E26030D7080@mcs.anl.gov>
Message-ID: <AANLkTi=7JsP_7RPV3=zR-i2-7MBKdWSHw980ruDm2u_N@mail.gmail.com>

Yes, I understand. But isn't there a flag to check if the transpose
was implicitly assumed ? It might be bad to compute the transpose of
the operator not provided, explicitly but the interface that calls
Restrict/Interpolate on Mat such as PCMG or DMMG should probably know
this and make the call appropriately. Or you could set the Mat
operations for the transposed operator accordingly i.e.,
MatMult_operator = MatMultTranspose; MatMultTranspose_operator =
MatMult. These are just couple of workarounds to tackle the glitch.

Barry, if you make further changes on this, I would much appreciate it
if you can let me know. Thanks.

Vijay

On Wed, Dec 15, 2010 at 9:16 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?If you explicitly provide both then you are ok. If you provide one and not the other it will fail silently with a bad algorithm.
>
>
> ?Barry
>
> On Dec 15, 2010, at 9:12 PM, Vijay S. Mahadevan wrote:
>
>>> ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.
>>
>> hmm, in the case when I provide a matrix for both
>> Interpolate/Restrict, I have (M==N==Nx==Ny), ?this would always call
>> the MatMult routine. As long as I provide both these
>> operators/matrices explicitly, there is still no problem. A possible
>> issue is only when someone provides just the restriction or
>> prolongation. But I understand that when this happens, the other
>> operator is computed explicitly as its transpose. If this is actual
>> implementation, I still dont see a problem. Although, if the
>> restriction/prolongation operator are implicitly assumed to be
>> transpose of the other, then it will quite horribly fail since only
>> MatMult is called for both. I am not completely sure about the mode
>> currently used in petsc but it would be great if you can help me
>> understand.
>>
>> Note: I probe more on this since my linear tesselation (most often)
>> results in the same number of dofs on the coarser level
>> (p-coarsened/h-refined) and I dont want a glitch to come back and bite
>> me later on..
>>
>> Vijay
>>
>> On Wed, Dec 15, 2010 at 8:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Barry,
>>>>
>>>> Thanks for the prompt change ! I do not work on the development
>>>> version but I can update these matrix routines alone.
>>>>
>>>>> ?Note it can still glitch if the restricted size is exactly the original size. :-(
>>>>
>>>> Why would it glitch if the restricted size is the same as the original
>>>> size though ? I dont see a case where your check (M==Ny) would fail.
>>>> Can you please elaborate more on this ?
>>>
>>> ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage.
>>>
>>> ?Barry
>>>
>>>>
>>>> Vijay
>>>>
>>>> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>> ?I have pushed this change to petsc-dev and it is ready for use.
>>>>>
>>>>> ? Barry
>>>>>
>>>>> ?Note it can still glitch if the restricted size is exactly the original size. :-(
>>>>>
>>>>>
>>>>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote:
>>>>>
>>>>>>
>>>>>> ?Vijay,
>>>>>>
>>>>>> ? ?The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev.
>>>>>>
>>>>>> ? I'll try to get it down in the next few hours but it may take a little longer.
>>>>>>
>>>>>>
>>>>>> ? Barry
>>>>>>
>>>>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have an implementation issue with the MatRestrict/Interpolate
>>>>>>> functions. The problem is that one of my coarser levels (with PCMG)
>>>>>>> has higher dofs than the finest level. This does not always happen and
>>>>>>> requires a weird fine mesh system (in a sense) that uses multi-grid,
>>>>>>> but the idea is that the finest level problem has a high order (HO)
>>>>>>> discretization while the lower level mesh has a linear tesselation of
>>>>>>> the finest HO level (which I can optimize) and then adaptively
>>>>>>> coarsened levels beyond that. Since the number of columns in this case
>>>>>>> is larger than the number of rows, MatRestrict invariably calls
>>>>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while
>>>>>>> calling ?MatInterpolate. These result in assertion errors while
>>>>>>> comparing the length of Mat and Vec. The chosen method is based on
>>>>>>> whether (M>N) which seems to act against what I am doing here...
>>>>>>>
>>>>>>> I can always implement a shell matrix to replicate
>>>>>>> Restrict/Interpolate actions but my question is whether if such
>>>>>>> discretization will yield a consistent convergence in MG algorithm ?
>>>>>>> Is there a strong reason for checking if (M>N) rather than just doing
>>>>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would
>>>>>>> appreciate any detailed answer that you can provide for this and any
>>>>>>> suggestions to use the existing methods (without implementing the
>>>>>>> shell restriction) is very welcome.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> vijay
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

From enjoywm at cs.wm.edu  Thu Dec 16 14:52:53 2010
From: enjoywm at cs.wm.edu (enjoywm at cs.wm.edu)
Date: Thu, 16 Dec 2010 15:52:53 -0500
Subject: [petsc-users] installation correct?
Message-ID: <3164c3ba3dd9c86908be6debdc9f9788.squirrel@mail.cs.wm.edu>

Hi,
After make test, I got the following output.
I want to make sure if the installation is correct.

Thanks.

Yixun

command:
/petsc-3.1-p6> make PETSC_DIR=/yliu/MyVC/petsc-3.1-p6
PETSC_ARCH=linux-gnu-c-debug test


******************************************output****************************************************************
Running test examples to verify correct installation
C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI
process
C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI
processes
--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html
/yliu/MPICH2/bin/mpif90 -c  -Wall -Wno-unused-variable -g  
-I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include
-I/yliu/MyVC/petsc-3.1-p6/include
-I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include
  -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include
-I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include
   -o ex5f.o ex5f.F
ex5f.F:92.72:

      call PetscOptionsGetReal(PETSC_NULL_CHARACTER,'-par',lambda,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:113.72:

      call DACreate2d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_STAR,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:114.72:

     &     i4,i4,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:125.72:

      call DAGetInfo(da,PETSC_NULL_INTEGER,mx,my,PETSC_NULL_INTEGER,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:126.72:

     &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:127.72:

     &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:128.72:

     &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:130.72:

      call DAGetCorners(da,xs,ys,PETSC_NULL_INTEGER,xm,ym,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:132.72:

      call DAGetGhostCorners(da,gxs,gys,PETSC_NULL_INTEGER,gxm,gym,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:188.72:

      call SNESSetJacobian(snes,A,J,SNESDAComputeJacobian,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:344.72:

            if (i .eq. 1 .or. j .eq. 1
                                                                        1
Warning: Line truncated at (1)
ex5f.F:348.72:

              x(i,j) = temp1 *
                                                                        1
Warning: Line truncated at (1)
ex5f.F:412.72:

            if (i .eq. 1 .or. j .eq. 1
                                                                        1
Warning: Line truncated at (1)
ex5f.F:417.72:

               uxx = hydhx * (two*u
                                                                        1
Warning: Line truncated at (1)
ex5f.F:517.72:

            if (i .eq. 1 .or. j .eq. 1
                                                                        1
Warning: Line truncated at (1)
ex5f.F:522.72:

               call MatSetValuesLocal(jac,i1,row,i1,col,v,
                                                                        1
Warning: Line truncated at (1)
ex5f.F:528.72:

               v(3) = two*(hydhx + hxdhy)
                                                                        1
Warning: Line truncated at (1)
ex5f.F:537.72:

               call MatSetValuesLocal(jac,i1,row,i5,col,v,
                                                                        1
Warning: Line truncated at (1)
/yliu/MPICH2/bin/mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o
-Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib
-L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lpetsc  -lX11
-Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib
-L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lparmetis -lmetis
-lflapack -lfblas -lm -L/yliu/MPICH2/lib
-L/usr/lib64/gcc/x86_64-suse-linux/4.5 -L/usr/x86_64-suse-linux/lib -ldl
-lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lgfortran -lm -lm
-ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
/bin/rm -f ex5f.o
Fortran example src/snes/examples/tutorials/ex5f run successfully with 1
MPI process
Completed test examples

**************************************************************************************************************************


From balay at mcs.anl.gov  Thu Dec 16 14:55:41 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 16 Dec 2010 14:55:41 -0600 (CST)
Subject: [petsc-users] installation correct?
In-Reply-To: <3164c3ba3dd9c86908be6debdc9f9788.squirrel@mail.cs.wm.edu>
References: <3164c3ba3dd9c86908be6debdc9f9788.squirrel@mail.cs.wm.edu>
Message-ID: <alpine.LFD.2.02.1012161454450.2644@localhost6.localdomain6>

yes - you have a valid install. You can ignore the gfortran warnings

[if you wish to eliminate them - you can use FFLAGS=-Wno-line-truncation with configure]

Satish

On Thu, 16 Dec 2010, enjoywm at cs.wm.edu wrote:

> Hi,
> After make test, I got the following output.
> I want to make sure if the installation is correct.
> 
> Thanks.
> 
> Yixun
> 
> command:
> /petsc-3.1-p6> make PETSC_DIR=/yliu/MyVC/petsc-3.1-p6
> PETSC_ARCH=linux-gnu-c-debug test
> 
> 
> ******************************************output****************************************************************
> Running test examples to verify correct installation
> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI
> process
> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI
> processes
> --------------Error detected during compile or link!-----------------------
> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html
> /yliu/MPICH2/bin/mpif90 -c  -Wall -Wno-unused-variable -g  
> -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include
> -I/yliu/MyVC/petsc-3.1-p6/include
> -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include
>   -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include
> -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include
>    -o ex5f.o ex5f.F
> ex5f.F:92.72:
> 
>       call PetscOptionsGetReal(PETSC_NULL_CHARACTER,'-par',lambda,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:113.72:
> 
>       call DACreate2d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_STAR,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:114.72:
> 
>      &     i4,i4,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:125.72:
> 
>       call DAGetInfo(da,PETSC_NULL_INTEGER,mx,my,PETSC_NULL_INTEGER,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:126.72:
> 
>      &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:127.72:
> 
>      &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:128.72:
> 
>      &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:130.72:
> 
>       call DAGetCorners(da,xs,ys,PETSC_NULL_INTEGER,xm,ym,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:132.72:
> 
>       call DAGetGhostCorners(da,gxs,gys,PETSC_NULL_INTEGER,gxm,gym,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:188.72:
> 
>       call SNESSetJacobian(snes,A,J,SNESDAComputeJacobian,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:344.72:
> 
>             if (i .eq. 1 .or. j .eq. 1
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:348.72:
> 
>               x(i,j) = temp1 *
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:412.72:
> 
>             if (i .eq. 1 .or. j .eq. 1
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:417.72:
> 
>                uxx = hydhx * (two*u
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:517.72:
> 
>             if (i .eq. 1 .or. j .eq. 1
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:522.72:
> 
>                call MatSetValuesLocal(jac,i1,row,i1,col,v,
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:528.72:
> 
>                v(3) = two*(hydhx + hxdhy)
>                                                                         1
> Warning: Line truncated at (1)
> ex5f.F:537.72:
> 
>                call MatSetValuesLocal(jac,i1,row,i5,col,v,
>                                                                         1
> Warning: Line truncated at (1)
> /yliu/MPICH2/bin/mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o
> -Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib
> -L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lpetsc  -lX11
> -Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib
> -L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lparmetis -lmetis
> -lflapack -lfblas -lm -L/yliu/MPICH2/lib
> -L/usr/lib64/gcc/x86_64-suse-linux/4.5 -L/usr/x86_64-suse-linux/lib -ldl
> -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lgfortran -lm -lm
> -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
> /bin/rm -f ex5f.o
> Fortran example src/snes/examples/tutorials/ex5f run successfully with 1
> MPI process
> Completed test examples
> 
> **************************************************************************************************************************
> 
> 
> 
> 
> 


From mmnasr at gmail.com  Fri Dec 17 09:23:42 2010
From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani)
Date: Fri, 17 Dec 2010 07:23:42 -0800
Subject: [petsc-users] Matrix setup and Distributed arrays,
Message-ID: <AANLkTin10eh7rSNPgJ5w=jh6CgOOJ7A+MotnSAB44gb0@mail.gmail.com>

Hi guys,

I am trying to solve a simpel Poisson equation on regular grid (test case
for my code).
This is my problem. Sorry for long discussion. I want to make it clear what
I am doing.


1- Create distributed array (same number of processors and parallel layout):
DA_3D_STAR: DA_STAR_STENCIL, width=1
2- Create distributed array (same number of processors and parallel layout):
DA_3D_BOX  : DA_BOX_STENCIL, width=3
3- Create Matrix (A) using:     ierr = DAGetMatrix(DA_3D_STAR, MATMPIAIJ,
&A);
4- Setup Matrix A in the corresponding fashion

       4-1    For all the regular local nodes (based on DA_3D_STAR), simply
use MatSetValuesStencil() to insert maximum 7-nonzerons in each row.
       4-2    For some special nodes (still local though) use a 9-point
interpolation equation. Theses nonzeros might extend outside of 1-layer of
ghost nodes, also it might be in the direction not corresponding to the
7-point stencil. Hence, it would be possible that some rows in the matrix
owns nonzeros not within the local+ghost node regions. MatSetValuesStencil()
does not work.
       Here, the DA_3D_BOX comes useful.
       4-3     Return the global index of any node within the range of
DA_3D_STAR, using the command
                 ierr = DAGetGlobalIndices(DA_3D_BOX, &nt, &global_indices);

       4-4     Now, I can freely use MatSetValues() to insert the 9-point
interpolation nonzeros into the matrix (using the returned global indices).

I thought this should work since MatSetValues() does not care if it is
inserting in the local part or the global section of the matrix on the
neighboring processor.
However, for a very simple test case (solution of Poisson equation), this
does not work in parallel.
I narrowed it down and came to a very simple but annoying conclusion. I
don't think the test code that I have suffers from any bugs.
But this is what I got,
When a new nonzero is inserted in (even) one row of the matrix where the new
nonzero corresponds to a node from a neighboring processor and outside the
width=1, STAR_STENCIL layout, using MatSetValues(), I get wrong results.
Even if I insert a "zero" value, but at the given place, unfortunately I get
wrong result.

Note that, for a matrix of 10*10*10, I even tested that for one single row
of a matrix, and unbelievably, I get wrong results.
I printed the matrix for two cases where there is a new (nonzero) in the
row, and there is not!
Comparing the two matrices and using

>>diff Bad Good

This is the only existing difference:

150c150
< row 149: (100, 0)  (148, 0)  (149, 1)  (150, 0)  (156, -1)  (198, 0)
(212, 0)
---
> row 149: (100, 0)  (148, 0)  (149, 1)  (150, 0)  (156, -1)  (198, 0)


They are exactly the same, except that there is new zero at (212) which
should not alter the results!
But I don't get the right result for the first case!
This is driving me crazy! I don't have any clue why this happens.

I can send you the test code if you think it helps.
But to me, it sounds like when the matrix is created using DAGetMatrix(),
there might be some sort of restriction to adding new nonzeros to the
locations not defined based on the stencil and not within the range of the
DA.

In advance, thank you so much.
Mohamad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101217/dcbc6ae9/attachment.htm>

From keita at cray.com  Fri Dec 17 11:53:08 2010
From: keita at cray.com (Keita Teranishi)
Date: Fri, 17 Dec 2010 11:53:08 -0600
Subject: [petsc-users] Why cannot ParMetis and Scotch be intergrated togther?
Message-ID: <5D6E0DF460ACF34C88644E1EA91DCD0D01B23D54CF@CFEXMBX.americas.cray.com>

Hi,

I found PETSc's configure script fails to integrate ParMetis and Scotch together.    I think Scotch has a option to rename it's ParMetis API to eliminate entry point duplications.  Or is there any reason PETSc cannot put them together?

Thanks,
================================
 Keita Teranishi
 Scientific Library Group
 Cray, Inc.
 keita at cray.com
================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101217/b71b5a58/attachment-0001.htm>

From knepley at gmail.com  Fri Dec 17 12:34:14 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 17 Dec 2010 10:34:14 -0800
Subject: [petsc-users] Matrix setup and Distributed arrays,
In-Reply-To: <AANLkTin10eh7rSNPgJ5w=jh6CgOOJ7A+MotnSAB44gb0@mail.gmail.com>
References: <AANLkTin10eh7rSNPgJ5w=jh6CgOOJ7A+MotnSAB44gb0@mail.gmail.com>
Message-ID: <AANLkTiknAHTZptLCDYwZVqxO+XPiBy=4KOsQNjn=NuJX@mail.gmail.com>

On Fri, Dec 17, 2010 at 7:23 AM, Mohamad M. Nasr-Azadani
<mmnasr at gmail.com>wrote:

> Hi guys,
>
> I am trying to solve a simpel Poisson equation on regular grid (test case
> for my code).
> This is my problem. Sorry for long discussion. I want to make it clear what
> I am doing.
>
>
> 1- Create distributed array (same number of processors and parallel
> layout): DA_3D_STAR: DA_STAR_STENCIL, width=1
> 2- Create distributed array (same number of processors and parallel
> layout): DA_3D_BOX  : DA_BOX_STENCIL, width=3
> 3- Create Matrix (A) using:     ierr = DAGetMatrix(DA_3D_STAR, MATMPIAIJ,
> &A);
> 4- Setup Matrix A in the corresponding fashion
>
>        4-1    For all the regular local nodes (based on DA_3D_STAR), simply
> use MatSetValuesStencil() to insert maximum 7-nonzerons in each row.
>        4-2    For some special nodes (still local though) use a 9-point
> interpolation equation. Theses nonzeros might extend outside of 1-layer of
> ghost nodes, also it might be in the direction not corresponding to the
> 7-point stencil. Hence, it would be possible that some rows in the matrix
> owns nonzeros not within the local+ghost node regions. MatSetValuesStencil()
> does not work.
>        Here, the DA_3D_BOX comes useful.
>        4-3     Return the global index of any node within the range of
> DA_3D_STAR, using the command
>                  ierr = DAGetGlobalIndices(DA_3D_BOX, &nt,
> &global_indices);
>        4-4     Now, I can freely use MatSetValues() to insert the 9-point
> interpolation nonzeros into the matrix (using the returned global indices).
>
> I thought this should work since MatSetValues() does not care if it is
> inserting in the local part or the global section of the matrix on the
> neighboring processor.
> However, for a very simple test case (solution of Poisson equation), this
> does not work in parallel.
> I narrowed it down and came to a very simple but annoying conclusion. I
> don't think the test code that I have suffers from any bugs.
> But this is what I got,
> When a new nonzero is inserted in (even) one row of the matrix where the
> new nonzero corresponds to a node from a neighboring processor and outside
> the width=1, STAR_STENCIL layout, using MatSetValues(), I get wrong results.
>
> Even if I insert a "zero" value, but at the given place, unfortunately I
> get wrong result.
>
> Note that, for a matrix of 10*10*10, I even tested that for one single row
> of a matrix, and unbelievably, I get wrong results.
> I printed the matrix for two cases where there is a new (nonzero) in the
> row, and there is not!
> Comparing the two matrices and using
>
> >>diff Bad Good
>
> This is the only existing difference:
>
> 150c150
> < row 149: (100, 0)  (148, 0)  (149, 1)  (150, 0)  (156, -1)  (198, 0)
> (212, 0)
> ---
> > row 149: (100, 0)  (148, 0)  (149, 1)  (150, 0)  (156, -1)  (198, 0)
>
>
> They are exactly the same, except that there is new zero at (212) which
> should not alter the results!
> But I don't get the right result for the first case!
>

I do not know what you mean by the "right result". If the only difference is
a 0 in the matrix, you will
get the same result for MatMult(). Did you check this?

  Thanks,

     Matt


> This is driving me crazy! I don't have any clue why this happens.
>
> I can send you the test code if you think it helps.
> But to me, it sounds like when the matrix is created using DAGetMatrix(),
> there might be some sort of restriction to adding new nonzeros to the
> locations not defined based on the stencil and not within the range of the
> DA.
>
> In advance, thank you so much.
> Mohamad
>
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101217/ef27cf9e/attachment.htm>

From bsmith at mcs.anl.gov  Fri Dec 17 12:45:22 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 17 Dec 2010 12:45:22 -0600
Subject: [petsc-users] Matrix setup and Distributed arrays,
In-Reply-To: <AANLkTin10eh7rSNPgJ5w=jh6CgOOJ7A+MotnSAB44gb0@mail.gmail.com>
References: <AANLkTin10eh7rSNPgJ5w=jh6CgOOJ7A+MotnSAB44gb0@mail.gmail.com>
Message-ID: <72A66DA3-BB86-4276-9C2C-C680A0A23D6A@mcs.anl.gov>


  Send the code to petsc-maint at mcs.anl.gov likely you are not calling MatAssemblyBegin/End or something similar in the correct place.

   Barry


On Dec 17, 2010, at 9:23 AM, Mohamad M. Nasr-Azadani wrote:

> Hi guys, 
> 
> I am trying to solve a simpel Poisson equation on regular grid (test case for my code). 
> This is my problem. Sorry for long discussion. I want to make it clear what I am doing. 
> 
> 
> 1- Create distributed array (same number of processors and parallel layout): DA_3D_STAR: DA_STAR_STENCIL, width=1
> 2- Create distributed array (same number of processors and parallel layout): DA_3D_BOX  : DA_BOX_STENCIL, width=3
> 3- Create Matrix (A) using:     ierr = DAGetMatrix(DA_3D_STAR, MATMPIAIJ, &A); 
> 4- Setup Matrix A in the corresponding fashion
> 
>        4-1    For all the regular local nodes (based on DA_3D_STAR), simply use MatSetValuesStencil() to insert maximum 7-nonzerons in each row. 
>        4-2    For some special nodes (still local though) use a 9-point interpolation equation. Theses nonzeros might extend outside of 1-layer of ghost nodes, also it might be in the direction not corresponding to the 7-point stencil. Hence, it would be possible that some rows in the matrix owns nonzeros not within the local+ghost node regions. MatSetValuesStencil() does not work. 
>        Here, the DA_3D_BOX comes useful. 
>        4-3     Return the global index of any node within the range of DA_3D_STAR, using the command 
>                  ierr = DAGetGlobalIndices(DA_3D_BOX, &nt, &global_indices); 
>        4-4     Now, I can freely use MatSetValues() to insert the 9-point interpolation nonzeros into the matrix (using the returned global indices).
> 
> I thought this should work since MatSetValues() does not care if it is inserting in the local part or the global section of the matrix on the neighboring processor. 
> However, for a very simple test case (solution of Poisson equation), this does not work in parallel. 
> I narrowed it down and came to a very simple but annoying conclusion. I don't think the test code that I have suffers from any bugs. 
> But this is what I got, 
> When a new nonzero is inserted in (even) one row of the matrix where the new nonzero corresponds to a node from a neighboring processor and outside the width=1, STAR_STENCIL layout, using MatSetValues(), I get wrong results. 
> Even if I insert a "zero" value, but at the given place, unfortunately I get wrong result. 
> 
> Note that, for a matrix of 10*10*10, I even tested that for one single row of a matrix, and unbelievably, I get wrong results. 
> I printed the matrix for two cases where there is a new (nonzero) in the row, and there is not! 
> Comparing the two matrices and using 
> 
> >>diff Bad Good 
> 
> This is the only existing difference: 
> 
> 150c150
> < row 149: (100, 0)  (148, 0)  (149, 1)  (150, 0)  (156, -1)  (198, 0)  (212, 0)
> ---
> > row 149: (100, 0)  (148, 0)  (149, 1)  (150, 0)  (156, -1)  (198, 0)
> 
> 
> They are exactly the same, except that there is new zero at (212) which should not alter the results! 
> But I don't get the right result for the first case! 
> This is driving me crazy! I don't have any clue why this happens.
> 
> I can send you the test code if you think it helps. 
> But to me, it sounds like when the matrix is created using DAGetMatrix(), there might be some sort of restriction to adding new nonzeros to the locations not defined based on the stencil and not within the range of the DA. 
> 
> In advance, thank you so much. 
> Mohamad
> 
> 
> 


From bsmith at mcs.anl.gov  Fri Dec 17 12:47:21 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 17 Dec 2010 12:47:21 -0600
Subject: [petsc-users] Why cannot ParMetis and Scotch be intergrated
	togther?
In-Reply-To: <5D6E0DF460ACF34C88644E1EA91DCD0D01B23D54CF@CFEXMBX.americas.cray.com>
References: <5D6E0DF460ACF34C88644E1EA91DCD0D01B23D54CF@CFEXMBX.americas.cray.com>
Message-ID: <E7FDB1BF-E55D-4587-8093-B8DC9F3806B6@mcs.anl.gov>


   They use to not be able to coexist in the same build. If Scotch has been properly fixed and you can figure out how to change Scotch's configuration to coexist we would be happy to accept the patch. We don't have the resources to fix a problem induced by someone else's very poor library design.

   Barry


On Dec 17, 2010, at 11:53 AM, Keita Teranishi wrote:

> Hi,
>  
> I found PETSc?s configure script fails to integrate ParMetis and Scotch together.    I think Scotch has a option to rename it?s ParMetis API to eliminate entry point duplications.  Or is there any reason PETSc cannot put them together?
>  
> Thanks,
> ================================
>  Keita Teranishi
>  Scientific Library Group
>  Cray, Inc.
>  keita at cray.com
> ================================
>  


From sapphire.jxy at gmail.com  Sun Dec 19 22:17:16 2010
From: sapphire.jxy at gmail.com (Xiaoyin Ji)
Date: Sun, 19 Dec 2010 23:17:16 -0500
Subject: [petsc-users] self-defined preconditioner available?
In-Reply-To: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
References: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
Message-ID: <E60B619D-2D41-4903-A365-5686B9706040@gmail.com>

Hi,

I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ksp solvers...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot.

Regards,

Xiaoyin Ji

From hzhang at mcs.anl.gov  Mon Dec 20 09:12:09 2010
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 20 Dec 2010 09:12:09 -0600
Subject: [petsc-users] self-defined preconditioner available?
In-Reply-To: <E60B619D-2D41-4903-A365-5686B9706040@gmail.com>
References: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
	<E60B619D-2D41-4903-A365-5686B9706040@gmail.com>
Message-ID: <AANLkTimnm_O3759X8KVm7vDjSuPa5zoCzPZuZt5e5eF6@mail.gmail.com>

Xiaoyin:
>
> I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ksp solvers...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot.

Yes, you can.
See "Shell Preconditioners" in petsc user manual.
Also, check out
~petsc/src/ksp/ksp/examples/tutorials/ex15.c as an example.

Hong

From sapphire.jxy at gmail.com  Mon Dec 20 09:47:49 2010
From: sapphire.jxy at gmail.com (Xiaoyin Ji)
Date: Mon, 20 Dec 2010 10:47:49 -0500
Subject: [petsc-users] self-defined preconditioner available?
In-Reply-To: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
References: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
Message-ID: <EB8FE4CC-1BC7-4953-9FCA-13D21BC1C974@gmail.com>

Hi,

I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ks...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot.

Regards,

Xiaoyin Ji

From knepley at gmail.com  Mon Dec 20 09:51:47 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Dec 2010 07:51:47 -0800
Subject: [petsc-users] self-defined preconditioner available?
In-Reply-To: <EB8FE4CC-1BC7-4953-9FCA-13D21BC1C974@gmail.com>
References: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
	<EB8FE4CC-1BC7-4953-9FCA-13D21BC1C974@gmail.com>
Message-ID: <AANLkTineJ2qiNrNHOn+mTK9XMxN-91LYFLvo8iURr3Nn@mail.gmail.com>

On Mon, Dec 20, 2010 at 7:47 AM, Xiaoyin Ji <sapphire.jxy at gmail.com> wrote:

> Hi,
>
> I was wondering if petsc could convert an existing matrix into a
> preconditioner and use it in ks...thought this shall be a straight-forward
> question but I didn't find the answer in manuals. Thanks a lot.
>

http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/PC/PCSHELL.html

   Matt


> Regards,
>
> Xiaoyin Ji


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/1267e5e1/attachment.htm>

From hzhang at mcs.anl.gov  Mon Dec 20 10:41:07 2010
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 20 Dec 2010 10:41:07 -0600
Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6
In-Reply-To: <4CFF3D10.6000600@in.tum.de>
References: <4CFE4FFE.9010602@in.tum.de>
	<AANLkTinHWHnSJoL_XRnL2MMdEAuZEVL9s4-Hc1SOADNb@mail.gmail.com>
	<4CFF3D10.6000600@in.tum.de>
Message-ID: <AANLkTik0fSO3ADUSUiy8UJoHzdNCXs1AXgDWVvfjPATk@mail.gmail.com>

Tobias :

> Ah, ok. Is there a possibility to hardwire that in the source code? We run
> different integration tests with different solver/pc combinations with one
> single executable call (currently without any options).

Yes. We've added MatSuperluSetILUDropTol() procedural call to petsc-dev.
See http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html
on how to get petsc-dev.
~petsc-dev/src/ksp/ksp/examples/tutorials/ex52.c is an example
on how to use it.

> Ok, thanks. Will this functionality be available also in the next release
> (p7) and when is this expected (approximately)?

It will be included in next petsc release (v3.2). When?
early next year I guess :-)

Hong
>
> Thanks and best regards
> Tobias
>
>>> we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor
>>> problem in some of our test cases: We use seqaij matrix format and GMRES
>>> with ILU dt.
>>>
>>> Our code contained a statement using PCFactorSetUseDropTolerance() which
>>> does not exist any longer. So we renamed the function call to
>>> PCFactorSetDropTolerance (same signature) which has no online docu but
>>> found
>>> by google ;-) Obviously, this function has not the same functionality as
>>> our
>>> tests fail.
>>>
>>> In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available
>>> from PETSc" showed a line containing:
>>> ILU dt: ILU dt ?seqaij ?Sparsekit (table survey)
>>>
>>> This is not included in the docu of 3.1-p6
>>>
>>> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html
>>> ) any more. Does that mean that the ILU dt version is not supported by
>>> the
>>> (default) petsc? I also checked the changelog but did not find anything
>>> there; sorry if I missed sth.
>>>
>>> Any help or infos will be highly appreciated ;-)
>>>
>>> Thanks and best regards
>>> Tobias
>>>
>

From yjxd.chen at gmail.com  Mon Dec 20 10:46:44 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Mon, 20 Dec 2010 17:46:44 +0100
Subject: [petsc-users] Very poor speed up performance
Message-ID: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>

Hi everyone,


I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A
and right hand vector b are read from files. The dimension of A is
1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
read correctly.

I compiled the program with optimized version (--with-debugging=0), tested
the speed up performance on two servers, and I have found that the
performance is very poor.

For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total 16
cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
cores.

On each of them, with the increasing of computing cores k from 1 to 8
(mpiexec ?n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
will increase from 1 to 6, but when the computing cores k increase from 9 to
16(for the first server) or 48 (for the second server), the speed up
decrease firstly and then remains a constant value 5.0 (for the first
server) or 4.5(for the second server).

Actually, the program LAMMPS speed up excellently on these two servers.

Any comments are very appreciated! Thanks!


--------------------------------------------------------------------------------------------------------------------------

PS: the related codes are as following,


//firstly read A and b from files

...

//then


              ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);

              ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);

              ierr = VecAssemblyBegin(b); CHKERRQ(ierr);

              ierr = VecAssemblyEnd(b); CHKERRQ(ierr);


              ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
CHKERRQ(ierr);

              ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);

              ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);


              ierr =
KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);

              ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);

              ierr =
KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);

              ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);


              ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);


              ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);


              ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);


              ierr = VecAssemblyBegin(x);CHKERRQ(ierr);

              ierr = VecAssemblyEnd(x);CHKERRQ(ierr);

...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/f4679d9f/attachment.htm>

From knepley at gmail.com  Mon Dec 20 11:06:32 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Dec 2010 09:06:32 -0800
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
Message-ID: <AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>

On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:

>
> Hi everyone,
>
>
> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A
> and right hand vector b are read from files. The dimension of A is
> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
> read correctly.
>
> I compiled the program with optimized version (--with-debugging=0), tested
> the speed up performance on two servers, and I have found that the
> performance is very poor.
>
> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total 16
> cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
> cores.
>
> On each of them, with the increasing of computing cores k from 1 to 8
> (mpiexec ?n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
> will increase from 1 to 6, but when the computing cores k increase from 9 to
> 16(for the first server) or 48 (for the second server), the speed up
> decrease firstly and then remains a constant value 5.0 (for the first
> server) or 4.5(for the second server).
>

We cannot say anything at all without -log_summary data for your runs.

   Matt


>  Actually, the program LAMMPS speed up excellently on these two servers.
>
> Any comments are very appreciated! Thanks!
>
>
>
>
> --------------------------------------------------------------------------------------------------------------------------
>
> PS: the related codes are as following,
>
>
> //firstly read A and b from files
>
> ...
>
> //then
>
>
>
>               ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
> CHKERRQ(ierr);
>
>               ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>
>               ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
>
>               ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
>
>
>
>               ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
> CHKERRQ(ierr);
>
>               ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
>
>               ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>
>
>
>               ierr =
> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>
>               ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>
>               ierr =
> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>
>               ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>
>
>
>               ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>
>
>
>               ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>
>
>
>               ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
>
>
>
>               ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
>
>               ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
>
> ...
>
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/f5bb4684/attachment.htm>

From bsmith at mcs.anl.gov  Mon Dec 20 12:36:34 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 20 Dec 2010 12:36:34 -0600
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
Message-ID: <37E7E191-1C32-4082-A680-2B6ACD556895@mcs.anl.gov>


  See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers in particular note the discussion on memory bandwidth. Once you have started using multiple cores per CPU you will start to see very little speedup with Jacobi preconditioning since it is very memory bandwidth limited. In fact pretty much all sparse iterative solvers are memory bandwidth limited.

   Barry


On Dec 20, 2010, at 10:46 AM, Yongjun Chen wrote:

> 
> Hi everyone,
> 
> 
> 
> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A and right hand vector b are read from files. The dimension of A is 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been read correctly.
> 
> I compiled the program with optimized version (--with-debugging=0), tested the speed up performance on two servers, and I have found that the performance is very poor.
> 
> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 cores.
> 
> On each of them, with the increasing of computing cores k from 1 to 8 (mpiexec ?n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up will increase from 1 to 6, but when the computing cores k increase from 9 to 16(for the first server) or 48 (for the second server), the speed up decrease firstly and then remains a constant value 5.0 (for the first server) or 4.5(for the second server).
> 
> Actually, the program LAMMPS speed up excellently on these two servers.
> 
> Any comments are very appreciated! Thanks!
> 
>  
> --------------------------------------------------------------------------------------------------------------------------
> 
> PS: the related codes are as following,
> 
> 
> 
> //firstly read A and b from files
> 
> ...
> 
> //then
> 
>  
>               ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> 
>               ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> 
>               ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
> 
>               ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
> 
>  
>               ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr);
> 
>               ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
> 
>               ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
> 
>  
>               ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
> 
>               ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
> 
>               ierr = KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
> 
>               ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
> 
>  
>               ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
> 
>  
>               ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
> 
>  
>               ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
> 
>  
>               ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
> 
>               ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
> 
> ...
> 
>  
> 


From yjxd.chen at gmail.com  Mon Dec 20 12:38:31 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Mon, 20 Dec 2010 19:38:31 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
Message-ID: <AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>

Hi Matt,

Thanks for your reply. Just now I have carried out a series of tests with
k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary
option. From 8 cores to 12 cores, a small speed up has been found this time,
but from 12 cores to 16 cores, the computation time increase!
Attached please find these 5 log files. Thank you very much!

mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg -log_summary
Here, I use ksp bicg instead of gmres, because the two ksp gives almost the
same speed up performance, as I have tried many times.
----------------------
(1) k=2
----------------------
Process 1 of total 2 on wmss04
Process 0 of total 2 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 17:42:23 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.25862e-06
Norm of error 1.25862e-06, Iterations 1475
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 762.874 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:55:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny Mon
Dec 20 18:55:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           8.160e+02      1.00000   8.160e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.120e+11      1.04720   3.050e+11  6.100e+11
Flops/sec:            3.824e+08      1.04720   3.737e+08  7.475e+08
MPI Messages:         2.958e+03      1.00068   2.958e+03  5.915e+03
MPI Message Lengths:  9.598e+08      1.00034   3.245e+05  1.919e+09
MPI Reductions:       4.483e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 8.1603e+02 100.0%  6.0997e+11 100.0%  5.915e+03 100.0%
3.245e+05      100.0%  4.467e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
0.0e+00 41 47 50 50  0  41 47 50 50  0   846
MatMultTranspose    1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
0.0e+00 42 47 50 50  0  42 47 50 50  0   846
MatAssemblyBegin       1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00
3.0e+03  2  1  0  0 66   2  1  0  0 66   340
VecNorm             1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00
1.5e+03  1  1  0  0 33   1  1  0  0 33   287
VecCopy                4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0   566
VecAYPX             2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   510
VecAssemblyBegin       6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   194
VecScatterBegin     2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
KSPSetup               1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05
4.4e+03 92100100100 99  92100100100 99   811
PCSetUp                1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   193
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    339744648     0
                 Vec    18             18     62239872     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       974736     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.21593e-06
Average time for MPI_Barrier(): 1.44005e-05
Average time for zero size MPI_Send(): 1.94311e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6
12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -O
Using Fortran compiler:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
-Wno-unused-variable -O
-----------------------------------------
Using include paths:
-I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
-I/sun42/cheny/petsc-3.1-p5-optimized/include
-I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
------------------------------------------
Using C linker:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -O
Using Fortran linker:
/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
-Wno-unused-variable -O
Using libraries:
-Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
-L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc
-Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
-L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx
-lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord
-lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt
-L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
-L/usr/lib64/gcc/x86_64-suse-linux/4.1.2
-L/opt/intel/Compiler/11.0/083/ipp/em64t/lib
-L/opt/intel/Compiler/11.0/083/mkl/lib/em64t
-L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90
-lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich
-lpthread -lrt -lgcc_s -ldl
------------------------------------------


----------------------
(2) k=4
----------------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 3 of total 4 on wmss04
Process 1 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 17:33:24 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 450.583 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:40:55 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Mon
Dec 20 18:40:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           4.807e+02      1.00000   4.807e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.241e+08      1.06872   3.168e+08  1.267e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 4.8066e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%
2.658e+05      100.0%  4.461e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05
0.0e+00 39 47 50 50  0  39 47 50 50  0  1494
MatMultTranspose    1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05
0.0e+00 40 47 50 50  0  40 47 50 50  0  1498
MatAssemblyBegin       1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00
2.9e+03  3  1  0  0 66   3  1  0  0 66   274
VecNorm             1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00
1.5e+03  1  1  0  0 33   1  1  0  0 33   310
VecCopy                4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00
0.0e+00  3  2  0  0  0   3  2  0  0  0   732
VecAYPX             2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   610
VecAssemblyBegin       6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   202
VecScatterBegin     2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  6  0  0  0  0   6  0  0  0  0     0
KSPSetup               1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05
4.4e+03 91100100100 99  91100100100 99  1386
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   201
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.5974e-06
Average time for MPI_Barrier(): 3.48091e-05
Average time for zero size MPI_Send(): 1.8537e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------


----------------------
(3) k=8
----------------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 18:14:59 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 311.937 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:20:11 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Mon
Dec 20 19:20:11 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           3.330e+02      1.00000   3.330e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.340e+08      1.09702   2.286e+08  1.829e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.3302e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%
2.430e+05      100.0%  4.461e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05
0.0e+00 38 47 50 50  0  38 47 50 50  0  2031
MatMultTranspose    1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05
0.0e+00 38 47 50 50  0  38 47 50 50  0  2120
MatAssemblyBegin       1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00
2.9e+03  6  1  0  0 66   6  1  0  0 66   194
VecNorm             1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00
1.5e+03  1  1  0  0 33   1  1  0  0 33   428
VecCopy                4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0  1127
VecAYPX             2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0  1015
VecAssemblyBegin       6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   359
VecScatterBegin     2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05
0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05
4.4e+03 90100100100 99  90100100100 99  2024
PCSetUp                1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
0.0e+00  3  1  0  0  0   3  1  0  0  0   358
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
Average time for MPI_Barrier(): 7.40051e-05
Average time for zero size MPI_Send(): 1.88947e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------


----------------------
(4) k=12
----------------------
Process 1 of total 12 on wmss04
Process 5 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 0 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.

End Assembly.
End Assembly.
=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 17:56:36 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 291.503 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:01:28 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny
Mon Dec 20 19:01:28 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           3.089e+02      1.00012   3.089e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.683e+08      1.11689   1.643e+08  1.971e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.0887e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%
2.345e+05      100.0%  4.461e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05
0.0e+00 35 47 50 50  0  35 47 50 50  0  2054
MatMultTranspose    1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05
0.0e+00 34 47 50 50  0  34 47 50 50  0  2175
MatAssemblyBegin       1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00
2.9e+03 13  1  0  0 66  13  1  0  0 66    60
VecNorm             1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00
1.5e+03  2  1  0  0 33   2  1  0  0 33   322
VecCopy                4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0   964
VecAYPX             2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0  1041
VecAssemblyBegin       6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   395
VecScatterBegin     2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05
0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 12  0  0  0  0  12  0  0  0  0     0
KSPSetup               1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05
4.4e+03 91100100100 99  91100100100 99  2173
PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   393
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.48499e-06
Average time for MPI_Barrier(): 0.000102377
Average time for zero size MPI_Send(): 2.15967e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------


----------------------
(5) k=16
----------------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process Process 15 of total 16 on wmss04
3Process 13 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 1 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
 of total 16 on wmss04

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.


=========================================================
Begin the solving:
=========================================================
The current time is: Mon Dec 20 18:02:28 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.15892e-06
Norm of error 1.15892e-06, Iterations 1497
=========================================================
The solver has finished successfully!
=========================================================
The solving time is 337.91 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:08:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny
Mon Dec 20 19:08:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total
Time (sec):           3.534e+02      1.00001   3.534e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.964e+10      1.13060   3.864e+10  6.182e+11
Flops/sec:            1.122e+08      1.13060   1.093e+08  1.749e+09
MPI Messages:         1.200e+04      3.99917   7.127e+03  1.140e+05
MPI Message Lengths:  1.950e+09      7.80999   1.819e+05  2.074e+10
MPI Reductions:       4.549e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.5342e+02 100.0%  6.1820e+11 100.0%  1.140e+05 100.0%
1.819e+05      100.0%  4.533e+03  99.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05
0.0e+00 40 47 50 50  0  40 47 50 50  0  1555
MatMultTranspose    1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05
0.0e+00 35 47 50 50  0  35 47 50 50  0  2069
MatAssemblyBegin       1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00
3.0e+03 10  1  0  0 66  10  1  0  0 66   104
VecNorm             1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00
1.5e+03  2  1  0  0 33   2  1  0  0 33   263
VecCopy                4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  2  0  0  0   2  2  0  0  0   931
VecAYPX             2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00
0.0e+00  1  1  0  0  0   1  1  0  0  0   962
VecAssemblyBegin       6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   360
VecScatterBegin     2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05
0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 22  0  0  0  0  22  0  0  0  0     0
KSPSetup               1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05
4.5e+03 92100100100 99  92100100100 99  1893
PCSetUp                1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  1  0  0  0   2  1  0  0  0   359
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.10352e-06
Average time for MPI_Barrier(): 0.000129986
Average time for zero size MPI_Send(): 2.08169e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
--known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
--with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
--download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
--download-parmetis=1 --download-mumps=1 --download-scalapack=1
--download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
--known-mpi-shared=1
-----------------------------------------


On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:
>
>>
>> Hi everyone,
>>
>>
>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A
>> and right hand vector b are read from files. The dimension of A is
>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
>> read correctly.
>>
>> I compiled the program with optimized version (--with-debugging=0), tested
>> the speed up performance on two servers, and I have found that the
>> performance is very poor.
>>
>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total
>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
>> cores.
>>
>> On each of them, with the increasing of computing cores k from 1 to 8
>> (mpiexec ?n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
>> will increase from 1 to 6, but when the computing cores k increase from 9 to
>> 16(for the first server) or 48 (for the second server), the speed up
>> decrease firstly and then remains a constant value 5.0 (for the first
>> server) or 4.5(for the second server).
>>
>
> We cannot say anything at all without -log_summary data for your runs.
>
>    Matt
>
>
>>  Actually, the program LAMMPS speed up excellently on these two servers.
>>
>> Any comments are very appreciated! Thanks!
>>
>>
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------
>>
>> PS: the related codes are as following,
>>
>>
>> //firstly read A and b from files
>>
>> ...
>>
>> //then
>>
>>
>>
>>               ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>> CHKERRQ(ierr);
>>
>>               ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>>
>>               ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
>>
>>               ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
>>
>>
>>
>>               ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
>> CHKERRQ(ierr);
>>
>>               ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
>>
>>               ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>
>>
>>
>>               ierr =
>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>>
>>               ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>>
>>               ierr =
>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>>
>>               ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>
>>
>>
>>               ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>
>>
>>
>>               ierr =
>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>>
>>
>>
>>               ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
>>
>>
>>
>>               ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
>>
>>               ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
>>
>> ...
>>
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


-- 
Dr.Yongjun Chen
Room 2507, Building M
Institute of Materials Science and Technology
Technical University of Hamburg-Harburg
Ei?endorfer Stra?e 42, 21073 Hamburg, Germany.
Tel:  +49 (0)40-42878-4386
Fax: +49 (0)40-42878-4070
E-mail: yjxd.chen at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/57181b30/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process Process 15 of total 16 on wmss04
3Process 13 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 1 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
 of total 16 on wmss04

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
Begin Assembly:

Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.


=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 18:02:28 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.15892e-06
Norm of error 1.15892e-06, Iterations 1497
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 337.91 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:08:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Mon Dec 20 19:08:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.534e+02      1.00001   3.534e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.964e+10      1.13060   3.864e+10  6.182e+11
Flops/sec:            1.122e+08      1.13060   1.093e+08  1.749e+09
MPI Messages:         1.200e+04      3.99917   7.127e+03  1.140e+05
MPI Message Lengths:  1.950e+09      7.80999   1.819e+05  2.074e+10
MPI Reductions:       4.549e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.5342e+02 100.0%  6.1820e+11 100.0%  1.140e+05 100.0%  1.819e+05      100.0%  4.533e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 40 47 50 50  0  40 47 50 50  0  1555
MatMultTranspose    1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2069
MatAssemblyBegin       1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 10  1  0  0 66  10  1  0  0 66   104
VecNorm             1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   263
VecCopy                4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   931
VecAYPX             2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   962
VecAssemblyBegin       6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   360
VecScatterBegin     2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22  0  0  0  0  22  0  0  0  0     0
KSPSetup               1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 92100100100 99  92100100100 99  1893
PCSetUp                1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   359
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.10352e-06
Average time for MPI_Barrier(): 0.000129986
Average time for zero size MPI_Send(): 2.08169e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 1 of total 2 on wmss04
Process 0 of total 2 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 17:42:23 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.25862e-06
Norm of error 1.25862e-06, Iterations 1475
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 762.874 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:55:06 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny Mon Dec 20 18:55:06 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           8.160e+02      1.00000   8.160e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.120e+11      1.04720   3.050e+11  6.100e+11
Flops/sec:            3.824e+08      1.04720   3.737e+08  7.475e+08
MPI Messages:         2.958e+03      1.00068   2.958e+03  5.915e+03
MPI Message Lengths:  9.598e+08      1.00034   3.245e+05  1.919e+09
MPI Reductions:       4.483e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 8.1603e+02 100.0%  6.0997e+11 100.0%  5.915e+03 100.0%  3.245e+05      100.0%  4.467e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 41 47 50 50  0  41 47 50 50  0   846
MatMultTranspose    1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 42 47 50 50  0  42 47 50 50  0   846
MatAssemblyBegin       1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 3.0e+03  2  1  0  0 66   2  1  0  0 66   340
VecNorm             1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   287
VecCopy                4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   566
VecAYPX             2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   510
VecAssemblyBegin       6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   194
VecScatterBegin     2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
KSPSetup               1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 4.4e+03 92100100100 99  92100100100 99   811
PCSetUp                1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   193
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    339744648     0
                 Vec    18             18     62239872     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       974736     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.21593e-06
Average time for MPI_Barrier(): 1.44005e-05
Average time for zero size MPI_Send(): 1.94311e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 3 of total 4 on wmss04
Process 1 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 17:33:24 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 450.583 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 17:40:55 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Mon Dec 20 18:40:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           4.807e+02      1.00000   4.807e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.241e+08      1.06872   3.168e+08  1.267e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.8066e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%  2.658e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1494
MatMultTranspose    1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 40 47 50 50  0  40 47 50 50  0  1498
MatAssemblyBegin       1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03  3  1  0  0 66   3  1  0  0 66   274
VecNorm             1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   310
VecCopy                4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0   732
VecAYPX             2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   610
VecAssemblyBegin       6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   202
VecScatterBegin     2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
KSPSetup               1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 91100100100 99  91100100100 99  1386
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   201
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.5974e-06
Average time for MPI_Barrier(): 3.48091e-05
Average time for zero size MPI_Send(): 1.8537e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 18:14:59 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 311.937 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:20:11 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Mon Dec 20 19:20:11 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.330e+02      1.00000   3.330e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.340e+08      1.09702   2.286e+08  1.829e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.3302e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2031
MatMultTranspose    1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2120
MatAssemblyBegin       1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03  6  1  0  0 66   6  1  0  0 66   194
VecNorm             1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   428
VecCopy                4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1127
VecAYPX             2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1015
VecAssemblyBegin       6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   359
VecScatterBegin     2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99  90100100100 99  2024
PCSetUp                1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   358
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
Average time for MPI_Barrier(): 7.40051e-05
Average time for zero size MPI_Send(): 1.88947e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 1 of total 12 on wmss04
Process 5 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 0 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.

End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Mon Dec 20 17:56:36 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 291.503 seconds.
The time accuracy is 1e-06 second.
The current time is Mon Dec 20 18:01:28 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Mon Dec 20 19:01:28 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.089e+02      1.00012   3.089e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.683e+08      1.11689   1.643e+08  1.971e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.0887e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2054
MatMultTranspose    1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 34 47 50 50  0  34 47 50 50  0  2175
MatAssemblyBegin       1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 13  1  0  0 66  13  1  0  0 66    60
VecNorm             1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   322
VecCopy                4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   964
VecAYPX             2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1041
VecAssemblyBegin       6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   395
VecScatterBegin     2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
KSPSetup               1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 91100100100 99  91100100100 99  2173
PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   393
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.48499e-06
Average time for MPI_Barrier(): 0.000102377
Average time for zero size MPI_Send(): 2.15967e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Nov 23 15:54:45 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------

From bsmith at mcs.anl.gov  Mon Dec 20 12:52:00 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 20 Dec 2010 12:52:00 -0600
Subject: [petsc-users] self-defined preconditioner available?
In-Reply-To: <E60B619D-2D41-4903-A365-5686B9706040@gmail.com>
References: <mailman.1055.1292222217.14128.petsc-users@mcs.anl.gov>
	<E60B619D-2D41-4903-A365-5686B9706040@gmail.com>
Message-ID: <24441805-6855-4ACC-9C47-A897E4C3F463@mcs.anl.gov>


   It is unclear what you mean. Do you mean use the MatMult() for an existing matrix as a preconditioner application for some other matrix? If so then use PCType of PCMAT and call KSPSetOperators() or SNESSetJacobian() with two different matrices the first that defines the linear system and the second that you wish to use directly as the preconditioner.  See the manual page for PCMAT.

  If you mean something else please explain.

   Barry


On Dec 19, 2010, at 10:17 PM, Xiaoyin Ji wrote:

> Hi,
> 
> I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ksp solvers...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot.
> 
> Regards,
> 
> Xiaoyin Ji


From knepley at gmail.com  Mon Dec 20 13:21:17 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Dec 2010 11:21:17 -0800
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
Message-ID: <AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>

On Mon, Dec 20, 2010 at 10:38 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:

> Hi Matt,
>
> Thanks for your reply. Just now I have carried out a series of tests with
> k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary
> option. From 8 cores to 12 cores, a small speed up has been found this time,
> but from 12 cores to 16 cores, the computation time increase!
> Attached please find these 5 log files. Thank you very much!
>

Its very clear from these, but Barry was right in his reply. These are
memory bandwidth limited
computations, so if you don't get any more bandwidth you will not speed up.
This is rarely mentioned
in sales pitches for multicore computers. LAMMPS is not limited by bandwidth
for most computations.

   Matt


> mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg
> -log_summary
> Here, I use ksp bicg instead of gmres, because the two ksp gives almost the
> same speed up performance, as I have tried many times.
> ----------------------
> (1) k=2
> ----------------------
> Process 1 of total 2 on wmss04
> Process 0 of total 2 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 17:42:23 2010
>
> KSP Object:
>   type: bicg
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:
>   type: jacobi
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=mpisbaij, rows=1177754, cols=1177754
>     total: nonzeros=49908476, allocated nonzeros=49908476
>         block size is 1
>
> norm(b-Ax)=1.25862e-06
> Norm of error 1.25862e-06, Iterations 1475
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 762.874 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 17:55:06 2010
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny
> Mon Dec 20 18:55:06 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           8.160e+02      1.00000   8.160e+02
> Objects:              3.000e+01      1.00000   3.000e+01
> Flops:                3.120e+11      1.04720   3.050e+11  6.100e+11
> Flops/sec:            3.824e+08      1.04720   3.737e+08  7.475e+08
> MPI Messages:         2.958e+03      1.00068   2.958e+03  5.915e+03
> MPI Message Lengths:  9.598e+08      1.00034   3.245e+05  1.919e+09
> MPI Reductions:       4.483e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 8.1603e+02 100.0%  6.0997e+11 100.0%  5.915e+03
> 100.0%  3.245e+05      100.0%  4.467e+03  99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
> 0.0e+00 41 47 50 50  0  41 47 50 50  0   846
> MatMultTranspose    1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
> 0.0e+00 42 47 50 50  0  42 47 50 50  0   846
> MatAssemblyBegin       1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecView                1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecDot              2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00
> 3.0e+03  2  1  0  0 66   2  1  0  0 66   340
> VecNorm             1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00
> 1.5e+03  1  1  0  0 33   1  1  0  0 33   287
> VecCopy                4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  2  0  0  0   2  2  0  0  0   566
> VecAYPX             2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   510
> VecAssemblyBegin       6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult    2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   194
> VecScatterBegin     2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05
> 0.0e+00  0  0100100  0   0  0100100  0     0
> VecScatterEnd       2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> KSPSetup               1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05
> 4.4e+03 92100100100 99  92100100100 99   811
> PCSetUp                1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply             2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   193
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     3              3    339744648     0
>                  Vec    18             18     62239872     0
>          Vec Scatter     2              2         1736     0
>            Index Set     4              4       974736     0
>        Krylov Solver     1              1          832     0
>       Preconditioner     1              1          872     0
>               Viewer     1              1          544     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 1.21593e-06
> Average time for MPI_Barrier(): 1.44005e-05
> Average time for zero size MPI_Send(): 1.94311e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
> Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04
> Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6
> 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
> Using PETSc arch: linux-gnu-c-opt
> -----------------------------------------
> Using C compiler:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -O
> Using Fortran compiler:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
> -Wno-unused-variable -O
> -----------------------------------------
> Using include paths:
> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
> -I/sun42/cheny/petsc-3.1-p5-optimized/include
> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
> ------------------------------------------
> Using C linker:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -O
> Using Fortran linker:
> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
> -Wno-unused-variable -O
> Using libraries:
> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc
> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx
> -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord
> -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt
> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
> -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2
> -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib
> -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t
> -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90
> -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich
> -lpthread -lrt -lgcc_s -ldl
> ------------------------------------------
>
>
> ----------------------
> (2) k=4
> ----------------------
> Process 0 of total 4 on wmss04
> Process 2 of total 4 on wmss04
> Process 3 of total 4 on wmss04
> Process 1 of total 4 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 17:33:24 2010
>
> KSP Object:
>   type: bicg
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:
>   type: jacobi
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=mpisbaij, rows=1177754, cols=1177754
>     total: nonzeros=49908476, allocated nonzeros=49908476
>         block size is 1
>
> norm(b-Ax)=1.28342e-06
> Norm of error 1.28342e-06, Iterations 1473
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 450.583 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 17:40:55 2010
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny
> Mon Dec 20 18:40:55 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           4.807e+02      1.00000   4.807e+02
> Objects:              3.000e+01      1.00000   3.000e+01
> Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
> Flops/sec:            3.241e+08      1.06872   3.168e+08  1.267e+09
> MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
> MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
> MPI Reductions:       4.477e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 4.8066e+02 100.0%  6.0914e+11 100.0%  1.772e+04
> 100.0%  2.658e+05      100.0%  4.461e+03  99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05
> 0.0e+00 39 47 50 50  0  39 47 50 50  0  1494
> MatMultTranspose    1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05
> 0.0e+00 40 47 50 50  0  40 47 50 50  0  1498
> MatAssemblyBegin       1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecView                1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecDot              2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00
> 2.9e+03  3  1  0  0 66   3  1  0  0 66   274
> VecNorm             1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03  1  1  0  0 33   1  1  0  0 33   310
> VecCopy                4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  2  0  0  0   3  2  0  0  0   732
> VecAYPX             2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   610
> VecAssemblyBegin       6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult    2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  1  0  0  0   3  1  0  0  0   202
> VecScatterBegin     2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05
> 0.0e+00  0  0100100  0   0  0100100  0     0
> VecScatterEnd       2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
> KSPSetup               1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05
> 4.4e+03 91100100100 99  91100100100 99  1386
> PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply             2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  1  0  0  0   3  1  0  0  0   201
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     3              3    169902696     0
>                  Vec    18             18     31282096     0
>          Vec Scatter     2              2         1736     0
>            Index Set     4              4       638616     0
>        Krylov Solver     1              1          832     0
>       Preconditioner     1              1          872     0
>               Viewer     1              1          544     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 1.5974e-06
> Average time for MPI_Barrier(): 3.48091e-05
> Average time for zero size MPI_Send(): 1.8537e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
>
> ----------------------
> (3) k=8
> ----------------------
> Process 0 of total 8 on wmss04
> Process 4 of total 8 on wmss04
> Process 2 of total 8 on wmss04
> Process 6 of total 8 on wmss04
> Process 3 of total 8 on wmss04
> Process 7 of total 8 on wmss04
> Process 1 of total 8 on wmss04
> Process 5 of total 8 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 18:14:59 2010
>
> KSP Object:
>   type: bicg
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:
>   type: jacobi
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=mpisbaij, rows=1177754, cols=1177754
>     total: nonzeros=49908476, allocated nonzeros=49908476
>         block size is 1
>
> norm(b-Ax)=1.32502e-06
> Norm of error 1.32502e-06, Iterations 1473
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 311.937 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 18:20:11 2010
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny
> Mon Dec 20 19:20:11 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           3.330e+02      1.00000   3.330e+02
> Objects:              3.000e+01      1.00000   3.000e+01
> Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
> Flops/sec:            2.340e+08      1.09702   2.286e+08  1.829e+09
> MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
> MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
> MPI Reductions:       4.477e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 3.3302e+02 100.0%  6.0914e+11 100.0%  4.135e+04
> 100.0%  2.430e+05      100.0%  4.461e+03  99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05
> 0.0e+00 38 47 50 50  0  38 47 50 50  0  2031
> MatMultTranspose    1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05
> 0.0e+00 38 47 50 50  0  38 47 50 50  0  2120
> MatAssemblyBegin       1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecView                1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> VecDot              2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00
> 2.9e+03  6  1  0  0 66   6  1  0  0 66   194
> VecNorm             1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03  1  1  0  0 33   1  1  0  0 33   428
> VecCopy                4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  2  0  0  0   2  2  0  0  0  1127
> VecAYPX             2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0  1015
> VecAssemblyBegin       6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult    2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  1  0  0  0   3  1  0  0  0   359
> VecScatterBegin     2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05
> 0.0e+00  1  0100100  0   1  0100100  0     0
> VecScatterEnd       2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
> KSPSetup               1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05
> 4.4e+03 90100100100 99  90100100100 99  2024
> PCSetUp                1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply             2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  1  0  0  0   3  1  0  0  0   358
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     3              3     84944064     0
>                  Vec    18             18     15741712     0
>          Vec Scatter     2              2         1736     0
>            Index Set     4              4       409008     0
>        Krylov Solver     1              1          832     0
>       Preconditioner     1              1          872     0
>               Viewer     1              1          544     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 3.38554e-06
> Average time for MPI_Barrier(): 7.40051e-05
> Average time for zero size MPI_Send(): 1.88947e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
>
> ----------------------
> (4) k=12
> ----------------------
> Process 1 of total 12 on wmss04
> Process 5 of total 12 on wmss04
> Process 2 of total 12 on wmss04
> Process 9 of total 12 on wmss04
> Process 6 of total 12 on wmss04
> Process 7 of total 12 on wmss04
> Process 10 of total 12 on wmss04
> Process 3 of total 12 on wmss04
> Process 11 of total 12 on wmss04
> Process 4 of total 12 on wmss04
> Process 8 of total 12 on wmss04
> Process 0 of total 12 on wmss04
> The dimension of Matrix A is n = 1177754
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.End Assembly.
> End Assembly.
> End Assembly.
>
> End Assembly.
> End Assembly.
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 17:56:36 2010
>
> KSP Object:
>   type: bicg
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:
>   type: jacobi
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=mpisbaij, rows=1177754, cols=1177754
>     total: nonzeros=49908476, allocated nonzeros=49908476
>         block size is 1
>
> norm(b-Ax)=1.28414e-06
> Norm of error 1.28414e-06, Iterations 1473
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 291.503 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 18:01:28 2010
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny
> Mon Dec 20 19:01:28 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           3.089e+02      1.00012   3.089e+02
> Objects:              3.000e+01      1.00000   3.000e+01
> Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
> Flops/sec:            1.683e+08      1.11689   1.643e+08  1.971e+09
> MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
> MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
> MPI Reductions:       4.477e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 3.0887e+02 100.0%  6.0890e+11 100.0%  6.498e+04
> 100.0%  2.345e+05      100.0%  4.461e+03  99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05
> 0.0e+00 35 47 50 50  0  35 47 50 50  0  2054
> MatMultTranspose    1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05
> 0.0e+00 34 47 50 50  0  34 47 50 50  0  2175
> MatAssemblyBegin       1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecView                1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> VecDot              2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00
> 2.9e+03 13  1  0  0 66  13  1  0  0 66    60
> VecNorm             1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03  2  1  0  0 33   2  1  0  0 33   322
> VecCopy                4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  2  0  0  0   2  2  0  0  0   964
> VecAYPX             2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0  1041
> VecAssemblyBegin       6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult    2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   395
> VecScatterBegin     2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05
> 0.0e+00  1  0100100  0   1  0100100  0     0
> VecScatterEnd       2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
> KSPSetup               1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05
> 4.4e+03 91100100100 99  91100100100 99  2173
> PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply             2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   393
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     3              3     56593044     0
>                  Vec    18             18     10534536     0
>          Vec Scatter     2              2         1736     0
>            Index Set     4              4       305424     0
>        Krylov Solver     1              1          832     0
>       Preconditioner     1              1          872     0
>               Viewer     1              1          544     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 6.48499e-06
> Average time for MPI_Barrier(): 0.000102377
> Average time for zero size MPI_Send(): 2.15967e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
> ----------------------
> (5) k=16
> ----------------------
> Process 0 of total 16 on wmss04
> Process 8 of total 16 on wmss04
> Process 4 of total 16 on wmss04
> Process 12 of total 16 on wmss04
> Process 2 of total 16 on wmss04
> Process 6 of total 16 on wmss04
> Process 5 of total 16 on wmss04
> Process 11 of total 16 on wmss04
> Process 14 of total 16 on wmss04
> Process 7 of total 16 on wmss04
> Process Process 15 of total 16 on wmss04
> 3Process 13 of total 16 on wmss04
> Process 10 of total 16 on wmss04
> Process 9 of total 16 on wmss04
> Process 1 of total 16 on wmss04
> The dimension of Matrix A is n = 1177754
>  of total 16 on wmss04
>
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
>
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
>
> Begin Assembly:
> Begin Assembly:
> Begin Assembly:
>
> Begin Assembly:
> Begin Assembly:
> End Assembly.
> End Assembly.End Assembly.
> End Assembly.End Assembly.End Assembly.End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.End Assembly.
>
> End Assembly.
> End Assembly.
> End Assembly.
> End Assembly.End Assembly.
>
>
>
> =========================================================
> Begin the solving:
> =========================================================
> The current time is: Mon Dec 20 18:02:28 2010
>
> KSP Object:
>   type: bicg
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:
>   type: jacobi
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=mpisbaij, rows=1177754, cols=1177754
>     total: nonzeros=49908476, allocated nonzeros=49908476
>         block size is 1
>
> norm(b-Ax)=1.15892e-06
> Norm of error 1.15892e-06, Iterations 1497
> =========================================================
> The solver has finished successfully!
> =========================================================
> The solving time is 337.91 seconds.
> The time accuracy is 1e-06 second.
> The current time is Mon Dec 20 18:08:06 2010
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny
> Mon Dec 20 19:08:06 2010
> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           3.534e+02      1.00001   3.534e+02
> Objects:              3.000e+01      1.00000   3.000e+01
> Flops:                3.964e+10      1.13060   3.864e+10  6.182e+11
> Flops/sec:            1.122e+08      1.13060   1.093e+08  1.749e+09
> MPI Messages:         1.200e+04      3.99917   7.127e+03  1.140e+05
> MPI Message Lengths:  1.950e+09      7.80999   1.819e+05  2.074e+10
> MPI Reductions:       4.549e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 3.5342e+02 100.0%  6.1820e+11 100.0%  1.140e+05
> 100.0%  1.819e+05      100.0%  4.533e+03  99.6%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05
> 0.0e+00 40 47 50 50  0  40 47 50 50  0  1555
> MatMultTranspose    1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05
> 0.0e+00 35 47 50 50  0  35 47 50 50  0  2069
> MatAssemblyBegin       1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecView                1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> VecDot              2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00
> 3.0e+03 10  1  0  0 66  10  1  0  0 66   104
> VecNorm             1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00
> 1.5e+03  2  1  0  0 33   2  1  0  0 33   263
> VecCopy                4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  2  0  0  0   2  2  0  0  0   931
> VecAYPX             2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0   962
> VecAssemblyBegin       6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult    2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   360
> VecScatterBegin     2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05
> 0.0e+00  1  0100100  0   1  0100100  0     0
> VecScatterEnd       2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 22  0  0  0  0  22  0  0  0  0     0
> KSPSetup               1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05
> 4.5e+03 92100100100 99  92100100100 99  1893
> PCSetUp                1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply             2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   359
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     3              3     42424600     0
>                  Vec    18             18      7924896     0
>          Vec Scatter     2              2         1736     0
>            Index Set     4              4       247632     0
>        Krylov Solver     1              1          832     0
>       Preconditioner     1              1          872     0
>               Viewer     1              1          544     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 6.10352e-06
> Average time for MPI_Barrier(): 0.000129986
> Average time for zero size MPI_Send(): 2.08169e-05
> #PETSc Option Table entries:
> -ksp_type bicg
> -log_summary
> -pc_type jacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Nov 23 15:54:45 2010
> Configure options: --known-level1-dcache-size=65536
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
> --known-mpi-shared=1
> -----------------------------------------
>
>
>
>
> On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com>wrote:
>>
>>>
>>> Hi everyone,
>>>
>>>
>>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A
>>> and right hand vector b are read from files. The dimension of A is
>>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
>>> read correctly.
>>>
>>> I compiled the program with optimized version (--with-debugging=0),
>>> tested the speed up performance on two servers, and I have found that the
>>> performance is very poor.
>>>
>>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total
>>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
>>> cores.
>>>
>>> On each of them, with the increasing of computing cores k from 1 to 8
>>> (mpiexec ?n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
>>> will increase from 1 to 6, but when the computing cores k increase from 9 to
>>> 16(for the first server) or 48 (for the second server), the speed up
>>> decrease firstly and then remains a constant value 5.0 (for the first
>>> server) or 4.5(for the second server).
>>>
>>
>> We cannot say anything at all without -log_summary data for your runs.
>>
>>    Matt
>>
>>
>>>  Actually, the program LAMMPS speed up excellently on these two servers.
>>>
>>> Any comments are very appreciated! Thanks!
>>>
>>>
>>>
>>>
>>> --------------------------------------------------------------------------------------------------------------------------
>>>
>>> PS: the related codes are as following,
>>>
>>>
>>> //firstly read A and b from files
>>>
>>> ...
>>>
>>> //then
>>>
>>>
>>>
>>>               ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>>> CHKERRQ(ierr);
>>>
>>>               ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>>> CHKERRQ(ierr);
>>>
>>>               ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
>>>
>>>               ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
>>>
>>>
>>>
>>>               ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
>>> CHKERRQ(ierr);
>>>
>>>               ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
>>>
>>>               ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>>
>>>
>>>
>>>               ierr =
>>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>>>
>>>               ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>>>
>>>               ierr =
>>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>>>
>>>               ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>>
>>>
>>>
>>>               ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>>
>>>
>>>
>>>               ierr =
>>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>>>
>>>
>>>
>>>               ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
>>>
>>>
>>>
>>>               ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
>>>
>>>               ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
>>>
>>> ...
>>>
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> Dr.Yongjun Chen
> Room 2507, Building M
> Institute of Materials Science and Technology
> Technical University of Hamburg-Harburg
> Ei?endorfer Stra?e 42, 21073 Hamburg, Germany.
> Tel:  +49 (0)40-42878-4386
> Fax: +49 (0)40-42878-4070
> E-mail: yjxd.chen at gmail.com
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/0be40bd4/attachment-0001.htm>

From yjxd.chen at gmail.com  Mon Dec 20 15:59:20 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Mon, 20 Dec 2010 22:59:20 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
Message-ID: <AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>

Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and
see what I can get.

Yongjun


On Mon, Dec 20, 2010 at 8:21 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Dec 20, 2010 at 10:38 AM, Yongjun Chen <yjxd.chen at gmail.com>wrote:
>
>> Hi Matt,
>>
>> Thanks for your reply. Just now I have carried out a series of tests with
>> k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary
>> option. From 8 cores to 12 cores, a small speed up has been found this time,
>> but from 12 cores to 16 cores, the computation time increase!
>> Attached please find these 5 log files. Thank you very much!
>>
>
> Its very clear from these, but Barry was right in his reply. These are
> memory bandwidth limited
> computations, so if you don't get any more bandwidth you will not speed up.
> This is rarely mentioned
> in sales pitches for multicore computers. LAMMPS is not limited by
> bandwidth for most computations.
>
>    Matt
>
>
>> mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg
>> -log_summary
>> Here, I use ksp bicg instead of gmres, because the two ksp gives almost
>> the same speed up performance, as I have tried many times.
>> ----------------------
>> (1) k=2
>> ----------------------
>> Process 1 of total 2 on wmss04
>> Process 0 of total 2 on wmss04
>> The dimension of Matrix A is n = 1177754
>> Begin Assembly:
>> Begin Assembly:
>> End Assembly.
>> End Assembly.
>> =========================================================
>> Begin the solving:
>> =========================================================
>> The current time is: Mon Dec 20 17:42:23 2010
>>
>> KSP Object:
>>   type: bicg
>>   maximum iterations=10000, initial guess is zero
>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>>   left preconditioning
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:
>>   type: jacobi
>>   linear system matrix = precond matrix:
>>   Matrix Object:
>>     type=mpisbaij, rows=1177754, cols=1177754
>>     total: nonzeros=49908476, allocated nonzeros=49908476
>>         block size is 1
>>
>> norm(b-Ax)=1.25862e-06
>> Norm of error 1.25862e-06, Iterations 1475
>> =========================================================
>> The solver has finished successfully!
>> =========================================================
>> The solving time is 762.874 seconds.
>> The time accuracy is 1e-06 second.
>> The current time is Mon Dec 20 17:55:06 2010
>>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny
>> Mon Dec 20 18:55:06 2010
>> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           8.160e+02      1.00000   8.160e+02
>> Objects:              3.000e+01      1.00000   3.000e+01
>> Flops:                3.120e+11      1.04720   3.050e+11  6.100e+11
>> Flops/sec:            3.824e+08      1.04720   3.737e+08  7.475e+08
>> MPI Messages:         2.958e+03      1.00068   2.958e+03  5.915e+03
>> MPI Message Lengths:  9.598e+08      1.00034   3.245e+05  1.919e+09
>> MPI Reductions:       4.483e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 8.1603e+02 100.0%  6.0997e+11 100.0%  5.915e+03
>> 100.0%  3.245e+05      100.0%  4.467e+03  99.6%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
>> PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message lengths
>> in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)
>> Flops                             --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
>> 0.0e+00 41 47 50 50  0  41 47 50 50  0   846
>> MatMultTranspose    1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05
>> 0.0e+00 42 47 50 50  0  42 47 50 50  0   846
>> MatAssemblyBegin       1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00
>> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05
>> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
>> MatView                1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecView                1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecDot              2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00
>> 3.0e+03  2  1  0  0 66   2  1  0  0 66   340
>> VecNorm             1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00
>> 1.5e+03  1  1  0  0 33   1  1  0  0 33   287
>> VecCopy                4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY             4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  2  0  0  0   2  2  0  0  0   566
>> VecAYPX             2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   510
>> VecAssemblyBegin       6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   194
>> VecScatterBegin     2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05
>> 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd       2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> KSPSetup               1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05
>> 4.4e+03 92100100100 99  92100100100 99   811
>> PCSetUp                1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply             2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   193
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Matrix     3              3    339744648     0
>>                  Vec    18             18     62239872     0
>>          Vec Scatter     2              2         1736     0
>>            Index Set     4              4       974736     0
>>        Krylov Solver     1              1          832     0
>>       Preconditioner     1              1          872     0
>>               Viewer     1              1          544     0
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 1.21593e-06
>> Average time for MPI_Barrier(): 1.44005e-05
>> Average time for zero size MPI_Send(): 1.94311e-05
>> #PETSc Option Table entries:
>> -ksp_type bicg
>> -log_summary
>> -pc_type jacobi
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Nov 23 15:54:45 2010
>> Configure options: --known-level1-dcache-size=65536
>> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
>> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
>> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
>> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
>> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
>> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
>> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
>> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
>> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
>> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
>> --known-mpi-shared=1
>> -----------------------------------------
>> Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04
>> Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6
>> 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
>> Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
>> Using PETSc arch: linux-gnu-c-opt
>> -----------------------------------------
>> Using C compiler:
>> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
>> -Wwrite-strings -Wno-strict-aliasing -O
>> Using Fortran compiler:
>> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
>> -Wno-unused-variable -O
>> -----------------------------------------
>> Using include paths:
>> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
>> -I/sun42/cheny/petsc-3.1-p5-optimized/include
>> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include
>> ------------------------------------------
>> Using C linker:
>> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall
>> -Wwrite-strings -Wno-strict-aliasing -O
>> Using Fortran linker:
>> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall
>> -Wno-unused-variable -O
>> Using libraries:
>> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
>> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc
>> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
>> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx
>> -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord
>> -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt
>> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib
>> -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2
>> -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib
>> -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t
>> -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
>> -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90
>> -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich
>> -lpthread -lrt -lgcc_s -ldl
>> ------------------------------------------
>>
>>
>> ----------------------
>> (2) k=4
>> ----------------------
>> Process 0 of total 4 on wmss04
>> Process 2 of total 4 on wmss04
>> Process 3 of total 4 on wmss04
>> Process 1 of total 4 on wmss04
>> The dimension of Matrix A is n = 1177754
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> =========================================================
>> Begin the solving:
>> =========================================================
>> The current time is: Mon Dec 20 17:33:24 2010
>>
>> KSP Object:
>>   type: bicg
>>   maximum iterations=10000, initial guess is zero
>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>>   left preconditioning
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:
>>   type: jacobi
>>   linear system matrix = precond matrix:
>>   Matrix Object:
>>     type=mpisbaij, rows=1177754, cols=1177754
>>     total: nonzeros=49908476, allocated nonzeros=49908476
>>         block size is 1
>>
>> norm(b-Ax)=1.28342e-06
>> Norm of error 1.28342e-06, Iterations 1473
>> =========================================================
>> The solver has finished successfully!
>> =========================================================
>> The solving time is 450.583 seconds.
>> The time accuracy is 1e-06 second.
>> The current time is Mon Dec 20 17:40:55 2010
>>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny
>> Mon Dec 20 18:40:55 2010
>> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           4.807e+02      1.00000   4.807e+02
>> Objects:              3.000e+01      1.00000   3.000e+01
>> Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
>> Flops/sec:            3.241e+08      1.06872   3.168e+08  1.267e+09
>> MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
>> MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
>> MPI Reductions:       4.477e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 4.8066e+02 100.0%  6.0914e+11 100.0%  1.772e+04
>> 100.0%  2.658e+05      100.0%  4.461e+03  99.6%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
>> PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message lengths
>> in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)
>> Flops                             --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05
>> 0.0e+00 39 47 50 50  0  39 47 50 50  0  1494
>> MatMultTranspose    1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05
>> 0.0e+00 40 47 50 50  0  40 47 50 50  0  1498
>> MatAssemblyBegin       1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00
>> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04
>> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
>> MatView                1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecView                1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecDot              2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00
>> 2.9e+03  3  1  0  0 66   3  1  0  0 66   274
>> VecNorm             1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00
>> 1.5e+03  1  1  0  0 33   1  1  0  0 33   310
>> VecCopy                4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY             4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3  2  0  0  0   3  2  0  0  0   732
>> VecAYPX             2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   610
>> VecAssemblyBegin       6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3  1  0  0  0   3  1  0  0  0   202
>> VecScatterBegin     2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05
>> 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd       2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
>> KSPSetup               1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05
>> 4.4e+03 91100100100 99  91100100100 99  1386
>> PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply             2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3  1  0  0  0   3  1  0  0  0   201
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Matrix     3              3    169902696     0
>>                  Vec    18             18     31282096     0
>>          Vec Scatter     2              2         1736     0
>>            Index Set     4              4       638616     0
>>        Krylov Solver     1              1          832     0
>>       Preconditioner     1              1          872     0
>>               Viewer     1              1          544     0
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 1.5974e-06
>> Average time for MPI_Barrier(): 3.48091e-05
>> Average time for zero size MPI_Send(): 1.8537e-05
>> #PETSc Option Table entries:
>> -ksp_type bicg
>> -log_summary
>> -pc_type jacobi
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Nov 23 15:54:45 2010
>> Configure options: --known-level1-dcache-size=65536
>> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
>> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
>> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
>> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
>> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
>> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
>> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
>> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
>> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
>> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
>> --known-mpi-shared=1
>> -----------------------------------------
>>
>>
>>
>> ----------------------
>> (3) k=8
>> ----------------------
>> Process 0 of total 8 on wmss04
>> Process 4 of total 8 on wmss04
>> Process 2 of total 8 on wmss04
>> Process 6 of total 8 on wmss04
>> Process 3 of total 8 on wmss04
>> Process 7 of total 8 on wmss04
>> Process 1 of total 8 on wmss04
>> Process 5 of total 8 on wmss04
>> The dimension of Matrix A is n = 1177754
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> =========================================================
>> Begin the solving:
>> =========================================================
>> The current time is: Mon Dec 20 18:14:59 2010
>>
>> KSP Object:
>>   type: bicg
>>   maximum iterations=10000, initial guess is zero
>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>>   left preconditioning
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:
>>   type: jacobi
>>   linear system matrix = precond matrix:
>>   Matrix Object:
>>     type=mpisbaij, rows=1177754, cols=1177754
>>     total: nonzeros=49908476, allocated nonzeros=49908476
>>         block size is 1
>>
>> norm(b-Ax)=1.32502e-06
>> Norm of error 1.32502e-06, Iterations 1473
>> =========================================================
>> The solver has finished successfully!
>> =========================================================
>> The solving time is 311.937 seconds.
>> The time accuracy is 1e-06 second.
>> The current time is Mon Dec 20 18:20:11 2010
>>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny
>> Mon Dec 20 19:20:11 2010
>> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           3.330e+02      1.00000   3.330e+02
>> Objects:              3.000e+01      1.00000   3.000e+01
>> Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
>> Flops/sec:            2.340e+08      1.09702   2.286e+08  1.829e+09
>> MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
>> MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
>> MPI Reductions:       4.477e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 3.3302e+02 100.0%  6.0914e+11 100.0%  4.135e+04
>> 100.0%  2.430e+05      100.0%  4.461e+03  99.6%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
>> PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message lengths
>> in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)
>> Flops                             --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05
>> 0.0e+00 38 47 50 50  0  38 47 50 50  0  2031
>> MatMultTranspose    1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05
>> 0.0e+00 38 47 50 50  0  38 47 50 50  0  2120
>> MatAssemblyBegin       1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00
>> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04
>> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
>> MatView                1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecView                1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> VecDot              2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00
>> 2.9e+03  6  1  0  0 66   6  1  0  0 66   194
>> VecNorm             1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00
>> 1.5e+03  1  1  0  0 33   1  1  0  0 33   428
>> VecCopy                4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY             4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  2  0  0  0   2  2  0  0  0  1127
>> VecAYPX             2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0  1015
>> VecAssemblyBegin       6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3  1  0  0  0   3  1  0  0  0   359
>> VecScatterBegin     2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05
>> 0.0e+00  1  0100100  0   1  0100100  0     0
>> VecScatterEnd       2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
>> KSPSetup               1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05
>> 4.4e+03 90100100100 99  90100100100 99  2024
>> PCSetUp                1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply             2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3  1  0  0  0   3  1  0  0  0   358
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Matrix     3              3     84944064     0
>>                  Vec    18             18     15741712     0
>>          Vec Scatter     2              2         1736     0
>>            Index Set     4              4       409008     0
>>        Krylov Solver     1              1          832     0
>>       Preconditioner     1              1          872     0
>>               Viewer     1              1          544     0
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 3.38554e-06
>> Average time for MPI_Barrier(): 7.40051e-05
>> Average time for zero size MPI_Send(): 1.88947e-05
>> #PETSc Option Table entries:
>> -ksp_type bicg
>> -log_summary
>> -pc_type jacobi
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Nov 23 15:54:45 2010
>> Configure options: --known-level1-dcache-size=65536
>> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
>> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
>> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
>> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
>> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
>> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
>> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
>> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
>> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
>> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
>> --known-mpi-shared=1
>> -----------------------------------------
>>
>>
>>
>> ----------------------
>> (4) k=12
>> ----------------------
>> Process 1 of total 12 on wmss04
>> Process 5 of total 12 on wmss04
>> Process 2 of total 12 on wmss04
>> Process 9 of total 12 on wmss04
>> Process 6 of total 12 on wmss04
>> Process 7 of total 12 on wmss04
>> Process 10 of total 12 on wmss04
>> Process 3 of total 12 on wmss04
>> Process 11 of total 12 on wmss04
>> Process 4 of total 12 on wmss04
>> Process 8 of total 12 on wmss04
>> Process 0 of total 12 on wmss04
>> The dimension of Matrix A is n = 1177754
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.End Assembly.
>> End Assembly.
>> End Assembly.
>>
>> End Assembly.
>> End Assembly.
>> =========================================================
>> Begin the solving:
>> =========================================================
>> The current time is: Mon Dec 20 17:56:36 2010
>>
>> KSP Object:
>>   type: bicg
>>   maximum iterations=10000, initial guess is zero
>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>>   left preconditioning
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:
>>   type: jacobi
>>   linear system matrix = precond matrix:
>>   Matrix Object:
>>     type=mpisbaij, rows=1177754, cols=1177754
>>     total: nonzeros=49908476, allocated nonzeros=49908476
>>         block size is 1
>>
>> norm(b-Ax)=1.28414e-06
>> Norm of error 1.28414e-06, Iterations 1473
>> =========================================================
>> The solver has finished successfully!
>> =========================================================
>> The solving time is 291.503 seconds.
>> The time accuracy is 1e-06 second.
>> The current time is Mon Dec 20 18:01:28 2010
>>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny
>> Mon Dec 20 19:01:28 2010
>> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           3.089e+02      1.00012   3.089e+02
>> Objects:              3.000e+01      1.00000   3.000e+01
>> Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
>> Flops/sec:            1.683e+08      1.11689   1.643e+08  1.971e+09
>> MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
>> MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
>> MPI Reductions:       4.477e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 3.0887e+02 100.0%  6.0890e+11 100.0%  6.498e+04
>> 100.0%  2.345e+05      100.0%  4.461e+03  99.6%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
>> PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message lengths
>> in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)
>> Flops                             --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05
>> 0.0e+00 35 47 50 50  0  35 47 50 50  0  2054
>> MatMultTranspose    1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05
>> 0.0e+00 34 47 50 50  0  34 47 50 50  0  2175
>> MatAssemblyBegin       1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04
>> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
>> MatView                1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecView                1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> VecDot              2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00
>> 2.9e+03 13  1  0  0 66  13  1  0  0 66    60
>> VecNorm             1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00
>> 1.5e+03  2  1  0  0 33   2  1  0  0 33   322
>> VecCopy                4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY             4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  2  0  0  0   2  2  0  0  0   964
>> VecAYPX             2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0  1041
>> VecAssemblyBegin       6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   395
>> VecScatterBegin     2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05
>> 0.0e+00  1  0100100  0   1  0100100  0     0
>> VecScatterEnd       2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
>> KSPSetup               1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05
>> 4.4e+03 91100100100 99  91100100100 99  2173
>> PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply             2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   393
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Matrix     3              3     56593044     0
>>                  Vec    18             18     10534536     0
>>          Vec Scatter     2              2         1736     0
>>            Index Set     4              4       305424     0
>>        Krylov Solver     1              1          832     0
>>       Preconditioner     1              1          872     0
>>               Viewer     1              1          544     0
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 6.48499e-06
>> Average time for MPI_Barrier(): 0.000102377
>> Average time for zero size MPI_Send(): 2.15967e-05
>> #PETSc Option Table entries:
>> -ksp_type bicg
>> -log_summary
>> -pc_type jacobi
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Nov 23 15:54:45 2010
>> Configure options: --known-level1-dcache-size=65536
>> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
>> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
>> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
>> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
>> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
>> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
>> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
>> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
>> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
>> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
>> --known-mpi-shared=1
>> -----------------------------------------
>>
>>
>> ----------------------
>> (5) k=16
>> ----------------------
>> Process 0 of total 16 on wmss04
>> Process 8 of total 16 on wmss04
>> Process 4 of total 16 on wmss04
>> Process 12 of total 16 on wmss04
>> Process 2 of total 16 on wmss04
>> Process 6 of total 16 on wmss04
>> Process 5 of total 16 on wmss04
>> Process 11 of total 16 on wmss04
>> Process 14 of total 16 on wmss04
>> Process 7 of total 16 on wmss04
>> Process Process 15 of total 16 on wmss04
>> 3Process 13 of total 16 on wmss04
>> Process 10 of total 16 on wmss04
>> Process 9 of total 16 on wmss04
>> Process 1 of total 16 on wmss04
>> The dimension of Matrix A is n = 1177754
>>  of total 16 on wmss04
>>
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>>
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>>
>> Begin Assembly:
>> Begin Assembly:
>> Begin Assembly:
>>
>> Begin Assembly:
>> Begin Assembly:
>> End Assembly.
>> End Assembly.End Assembly.
>> End Assembly.End Assembly.End Assembly.End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.End Assembly.
>>
>> End Assembly.
>> End Assembly.
>> End Assembly.
>> End Assembly.End Assembly.
>>
>>
>>
>> =========================================================
>> Begin the solving:
>> =========================================================
>> The current time is: Mon Dec 20 18:02:28 2010
>>
>> KSP Object:
>>   type: bicg
>>   maximum iterations=10000, initial guess is zero
>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
>>   left preconditioning
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:
>>   type: jacobi
>>   linear system matrix = precond matrix:
>>   Matrix Object:
>>     type=mpisbaij, rows=1177754, cols=1177754
>>     total: nonzeros=49908476, allocated nonzeros=49908476
>>         block size is 1
>>
>> norm(b-Ax)=1.15892e-06
>> Norm of error 1.15892e-06, Iterations 1497
>> =========================================================
>> The solver has finished successfully!
>> =========================================================
>> The solving time is 337.91 seconds.
>> The time accuracy is 1e-06 second.
>> The current time is Mon Dec 20 18:08:06 2010
>>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny
>> Mon Dec 20 19:08:06 2010
>> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           3.534e+02      1.00001   3.534e+02
>> Objects:              3.000e+01      1.00000   3.000e+01
>> Flops:                3.964e+10      1.13060   3.864e+10  6.182e+11
>> Flops/sec:            1.122e+08      1.13060   1.093e+08  1.749e+09
>> MPI Messages:         1.200e+04      3.99917   7.127e+03  1.140e+05
>> MPI Message Lengths:  1.950e+09      7.80999   1.819e+05  2.074e+10
>> MPI Reductions:       4.549e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 3.5342e+02 100.0%  6.1820e+11 100.0%  1.140e+05
>> 100.0%  1.819e+05      100.0%  4.533e+03  99.6%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
>> PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message lengths
>> in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)
>> Flops                             --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05
>> 0.0e+00 40 47 50 50  0  40 47 50 50  0  1555
>> MatMultTranspose    1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05
>> 0.0e+00 35 47 50 50  0  35 47 50 50  0  2069
>> MatAssemblyBegin       1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04
>> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
>> MatView                1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecView                1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> VecDot              2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00
>> 3.0e+03 10  1  0  0 66  10  1  0  0 66   104
>> VecNorm             1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00
>> 1.5e+03  2  1  0  0 33   2  1  0  0 33   263
>> VecCopy                4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY             4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  2  0  0  0   2  2  0  0  0   931
>> VecAYPX             2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  1  0  0  0   1  1  0  0  0   962
>> VecAssemblyBegin       6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   360
>> VecScatterBegin     2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05
>> 0.0e+00  1  0100100  0   1  0100100  0     0
>> VecScatterEnd       2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 22  0  0  0  0  22  0  0  0  0     0
>> KSPSetup               1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05
>> 4.5e+03 92100100100 99  92100100100 99  1893
>> PCSetUp                1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply             2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0   359
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Matrix     3              3     42424600     0
>>                  Vec    18             18      7924896     0
>>          Vec Scatter     2              2         1736     0
>>            Index Set     4              4       247632     0
>>        Krylov Solver     1              1          832     0
>>       Preconditioner     1              1          872     0
>>               Viewer     1              1          544     0
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 6.10352e-06
>> Average time for MPI_Barrier(): 0.000129986
>> Average time for zero size MPI_Send(): 2.08169e-05
>> #PETSc Option Table entries:
>> -ksp_type bicg
>> -log_summary
>> -pc_type jacobi
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Nov 23 15:54:45 2010
>> Configure options: --known-level1-dcache-size=65536
>> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2
>> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
>> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
>> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
>> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
>> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc
>> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1
>> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1
>> --download-parmetis=1 --download-mumps=1 --download-scalapack=1
>> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch
>> --known-mpi-shared=1
>> -----------------------------------------
>>
>>
>>
>>
>> On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com>wrote:
>>>
>>>>
>>>> Hi everyone,
>>>>
>>>>
>>>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix
>>>> A and right hand vector b are read from files. The dimension of A is
>>>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been
>>>> read correctly.
>>>>
>>>> I compiled the program with optimized version (--with-debugging=0),
>>>> tested the speed up performance on two servers, and I have found that the
>>>> performance is very poor.
>>>>
>>>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total
>>>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48
>>>> cores.
>>>>
>>>> On each of them, with the increasing of computing cores k from 1 to 8
>>>> (mpiexec ?n  k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up
>>>> will increase from 1 to 6, but when the computing cores k increase from 9 to
>>>> 16(for the first server) or 48 (for the second server), the speed up
>>>> decrease firstly and then remains a constant value 5.0 (for the first
>>>> server) or 4.5(for the second server).
>>>>
>>>
>>> We cannot say anything at all without -log_summary data for your runs.
>>>
>>>    Matt
>>>
>>>
>>>>  Actually, the program LAMMPS speed up excellently on these two
>>>> servers.
>>>>
>>>> Any comments are very appreciated! Thanks!
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> PS: the related codes are as following,
>>>>
>>>>
>>>> //firstly read A and b from files
>>>>
>>>> ...
>>>>
>>>> //then
>>>>
>>>>
>>>>
>>>>               ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>>>> CHKERRQ(ierr);
>>>>
>>>>               ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>>>> CHKERRQ(ierr);
>>>>
>>>>               ierr = VecAssemblyBegin(b); CHKERRQ(ierr);
>>>>
>>>>               ierr = VecAssemblyEnd(b); CHKERRQ(ierr);
>>>>
>>>>
>>>>
>>>>               ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);
>>>> CHKERRQ(ierr);
>>>>
>>>>               ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr);
>>>>
>>>>               ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>>>
>>>>
>>>>
>>>>               ierr =
>>>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>>>>
>>>>               ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>>>>
>>>>               ierr =
>>>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>>>>
>>>>               ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>>>
>>>>
>>>>
>>>>               ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>>>
>>>>
>>>>
>>>>               ierr =
>>>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>>>>
>>>>
>>>>
>>>>               ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr);
>>>>
>>>>
>>>>
>>>>               ierr = VecAssemblyBegin(x);CHKERRQ(ierr);
>>>>
>>>>               ierr = VecAssemblyEnd(x);CHKERRQ(ierr);
>>>>
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>>
>> --
>> Dr.Yongjun Chen
>> Room 2507, Building M
>> Institute of Materials Science and Technology
>> Technical University of Hamburg-Harburg
>> Ei?endorfer Stra?e 42, 21073 Hamburg, Germany.
>> Tel:  +49 (0)40-42878-4386
>> Fax: +49 (0)40-42878-4070
>> E-mail: yjxd.chen at gmail.com
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/79c6179b/attachment-0001.htm>

From balay at mcs.anl.gov  Mon Dec 20 16:04:37 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 20 Dec 2010 16:04:37 -0600 (CST)
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>

On Mon, 20 Dec 2010, Yongjun Chen wrote:

> Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and
> see what I can get.

hydra is just the process manager.

Also --download-mpich uses a slightly older version - with
device=ch3:sock for portability and valgrind reasons [development]

You might want to install latest mpich manually with the defaut
device=ch3:nemsis and recheck..

satish

From yjxd.chen at gmail.com  Mon Dec 20 16:12:01 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Mon, 20 Dec 2010 23:12:01 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
Message-ID: <AANLkTimAL_FWjsvDNm7GB8wsL=N11bEwo-qk_JApashU@mail.gmail.com>

Satish, many thanks for your advice!


On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Mon, 20 Dec 2010, Yongjun Chen wrote:
>
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly
> and
> > see what I can get.
>
> hydra is just the process manager.
>
> Also --download-mpich uses a slightly older version - with
> device=ch3:sock for portability and valgrind reasons [development]
>
> You might want to install latest mpich manually with the defaut
> device=ch3:nemsis and recheck..
>
> satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/484edbf3/attachment.htm>

From thomas.witkowski at tu-dresden.de  Tue Dec 21 07:49:58 2010
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Tue, 21 Dec 2010 14:49:58 +0100
Subject: [petsc-users] ParMETIS question
Message-ID: <4D10B086.2090503@tu-dresden.de>

Hi,

I have a not directly PETSc related question, but I hope to get some 
answer from the community here. In my FEM code, I make use of ParMETIS 
to partition the mesh. I make direct use of this library and not of 
PETSc's ParMETIS integration. The initial partition is always fine, but 
I use the ParMETIS_V3_AdaptiveRepart function for repartition the mesh 
due to local mesh adaption. In most cases, the result is fine, but there 
are two points, where I have trouble with:

1) Sometimes ParMETIS generates empty partitions, i.e., a processor has 
zero mesh elements. This is something my code cannot handle. Is this a 
bug or a feature? If it is a feature, is there any possiblity to disable it?

2) In most cases the specific partitions are not connected. If I put all 
data to ParMETIS in a correct way, is this okay? My code can handle it, 
but is slows down the computation due to larger interior boundaries and 
therefore to more communications.

Does anyone of you know an answer to these question? Is there a debug 
mode in ParMETIS, where I can see which data is set to its function calls?

Regards,

Thomas

From knepley at gmail.com  Tue Dec 21 10:31:01 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 21 Dec 2010 08:31:01 -0800
Subject: [petsc-users] ParMETIS question
In-Reply-To: <4D10B086.2090503@tu-dresden.de>
References: <4D10B086.2090503@tu-dresden.de>
Message-ID: <AANLkTin-SPMjsT4uB4Ms7wprsAXX9OGy+nPggstTf-nn@mail.gmail.com>

On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

> Hi,
>
> I have a not directly PETSc related question, but I hope to get some answer
> from the community here. In my FEM code, I make use of ParMETIS to partition
> the mesh. I make direct use of this library and not of PETSc's ParMETIS
> integration. The initial partition is always fine, but I use the
> ParMETIS_V3_AdaptiveRepart function for repartition the mesh due to local
> mesh adaption. In most cases, the result is fine, but there are two points,
> where I have trouble with:
>
> 1) Sometimes ParMETIS generates empty partitions, i.e., a processor has
> zero mesh elements. This is something my code cannot handle. Is this a bug
> or a feature? If it is a feature, is there any possiblity to disable it?
>

ParMetis has a balance constraint if you weight vertices. This will enforce
equal size partitions.


> 2) In most cases the specific partitions are not connected. If I put all
> data to ParMETIS in a correct way, is this okay? My code can handle it, but
> is slows down the computation due to larger interior boundaries and
> therefore to more communications.
>

ParMetis minimizes the overall boundary size, so I do not understand how you
could see this slowdown.

   Matt


> Does anyone of you know an answer to these question? Is there a debug mode
> in ParMETIS, where I can see which data is set to its function calls?
>
> Regards,
>
> Thomas
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101221/205fc2ce/attachment.htm>

From vijay.m at gmail.com  Tue Dec 21 12:53:52 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 21 Dec 2010 12:53:52 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
Message-ID: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>

Hi all,

I am running a linear problem discretized with FEM on a diffusion
reaction system, with discontinuous source distribution. When I run
FGMRes with geometric multigrid as its preconditioner, I notice that
every time after the restart in fgmres, the new residual is orders of
magnitude higher than the previous iteration. I might be wrong on this
but should the restart not preserve monotonicity in convergence ? Or
am I thinking of a different variant of Gmres here. Here's the
residual norm as a function of iteration with number of restarts=50.

 40 KSP Residual norm 2.489810374358e-06
 41 KSP Residual norm 1.585813670005e-06
 42 KSP Residual norm 1.059211836025e-06
 43 KSP Residual norm 6.701461059247e-07
 44 KSP Residual norm 4.127634824940e-07
 45 KSP Residual norm 2.511364148934e-07
 46 KSP Residual norm 1.307034672896e-07
 47 KSP Residual norm 7.105770015635e-08
 48 KSP Residual norm 4.098578230710e-08
 49 KSP Residual norm 2.426160176080e-08
-------------------------------------------------------------------------
 50 KSP Residual norm 1.864914790828e+02
 51 KSP Residual norm 6.741080961009e+01
 52 KSP Residual norm 5.191621875736e+01
 53 KSP Residual norm 4.513782866249e+01
 54 KSP Residual norm 3.320195603375e+01
 55 KSP Residual norm 2.699941296855e+01
 56 KSP Residual norm 1.707998091297e+01
 57 KSP Residual norm 1.219599670348e+01

Any suggestions to obtain a more smoother convergence would be much
appreciated. Thank you,

Vijay

From jed at 59A2.org  Tue Dec 21 13:10:38 2010
From: jed at 59A2.org (Jed Brown)
Date: Tue, 21 Dec 2010 20:10:38 +0100
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
Message-ID: <AANLkTi=jF2R2vGhoksiY6kXMUtnkzW-E+6inqxk88q3E@mail.gmail.com>

On Tue, Dec 21, 2010 at 19:53, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> I am running a linear problem discretized with FEM on a diffusion
> reaction system, with discontinuous source distribution. When I run
> FGMRes with geometric multigrid as its preconditioner, I notice that
> every time after the restart in fgmres, the new residual is orders of
> magnitude higher than the previous iteration. I might be wrong on this
> but should the restart not preserve monotonicity in convergence ? Or
> am I thinking of a different variant of Gmres here.
>

It is not possible to guarantee monotonicity for nonsymmetric matrices
without storing the full subspace.  There is no variant of GMRES, or any
Krylov method for that matter, that can do what you want.  You are seeing a
particularly large jump, if you actually have a linear preconditioner (if
you don't use Krylov cycles inside your smoothers) then you might try using
bcgs or some variant thereof which would avoid the high cost of restart.  Or
you could stop using restarts, it looks like you were getting close to an
adequate tolerance.  Or find a way to make the preconditioner strong enough
to converge in a reasonable number of iterations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101221/39f39522/attachment.htm>

From bsmith at mcs.anl.gov  Tue Dec 21 14:04:07 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 21 Dec 2010 14:04:07 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
Message-ID: <B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>


   This is a sign that the preconditioner is seriously messed up and should not be used in its current form.  It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.

   Barry


On Dec 21, 2010, at 12:53 PM, Vijay S. Mahadevan wrote:

> Hi all,
> 
> I am running a linear problem discretized with FEM on a diffusion
> reaction system, with discontinuous source distribution. When I run
> FGMRes with geometric multigrid as its preconditioner, I notice that
> every time after the restart in fgmres, the new residual is orders of
> magnitude higher than the previous iteration. I might be wrong on this
> but should the restart not preserve monotonicity in convergence ? Or
> am I thinking of a different variant of Gmres here. Here's the
> residual norm as a function of iteration with number of restarts=50.
> 
> 40 KSP Residual norm 2.489810374358e-06
> 41 KSP Residual norm 1.585813670005e-06
> 42 KSP Residual norm 1.059211836025e-06
> 43 KSP Residual norm 6.701461059247e-07
> 44 KSP Residual norm 4.127634824940e-07
> 45 KSP Residual norm 2.511364148934e-07
> 46 KSP Residual norm 1.307034672896e-07
> 47 KSP Residual norm 7.105770015635e-08
> 48 KSP Residual norm 4.098578230710e-08
> 49 KSP Residual norm 2.426160176080e-08
> -------------------------------------------------------------------------
> 50 KSP Residual norm 1.864914790828e+02
> 51 KSP Residual norm 6.741080961009e+01
> 52 KSP Residual norm 5.191621875736e+01
> 53 KSP Residual norm 4.513782866249e+01
> 54 KSP Residual norm 3.320195603375e+01
> 55 KSP Residual norm 2.699941296855e+01
> 56 KSP Residual norm 1.707998091297e+01
> 57 KSP Residual norm 1.219599670348e+01
> 
> Any suggestions to obtain a more smoother convergence would be much
> appreciated. Thank you,
> 
> Vijay


From jed at 59A2.org  Tue Dec 21 14:08:01 2010
From: jed at 59A2.org (Jed Brown)
Date: Tue, 21 Dec 2010 21:08:01 +0100
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
Message-ID: <AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>

On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:

> This is a sign that the preconditioner is seriously messed up and should
> not be used in its current form.  It can happen if the matrix is nearly
> singular and for example you use an incomplete factorization for a
> preconditioner that just screws up the scaling like totally. Run with
> -ksp_monitor_true_residual and you'll see that the solver is not really
> solving the problem even though it thinks it is converging fine.


FGMRES only does right preconditioning so it should be showing the true
residual.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101221/330a8c09/attachment.htm>

From vijay.m at gmail.com  Tue Dec 21 14:16:25 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 21 Dec 2010 14:16:25 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTi=jF2R2vGhoksiY6kXMUtnkzW-E+6inqxk88q3E@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<AANLkTi=jF2R2vGhoksiY6kXMUtnkzW-E+6inqxk88q3E@mail.gmail.com>
Message-ID: <AANLkTind4tm_ujFSzHJ+5rr0WrzSJUR2GWN+u=02m98+@mail.gmail.com>

Jed, I ask because after the restart, the residual changes 10 orders
of magnitude and a-priori, it is quite hard to decide the restart
number. Yes in the test case I presented, the residual gets close
enough to the tolerance  and I can afford few more vector storage but
for a much refined problem, this might not be the case and so it
worries me.

My initial tests with bcgs were not satisfactory (very bad convergence
as compared to gmres) but I tried GCR just now and it seems to
converge correctly to the right solution, monotonically for the same
problem. Alternatively, yes, I could make my preconditioner stronger
(add more levels, more smoothing steps etc..) to converge within the
restart limit.

Barry, the matrix is not nearly singular although I have not yet
looked at the effectiveness of the preconditoner thoroughly yet. It is
possible that the preconditioned operator might have some undesired
properties. Just to compare, the same linear system without any
preconditioning takes about 2400 iterations and maybe that gives some
ball park metric on the efficiency of the preconditioner.. Let me know
if you want to know some other specific information to better
understand the system.

Vijay

On Tue, Dec 21, 2010 at 1:10 PM, Jed Brown <jed at 59a2.org> wrote:
> On Tue, Dec 21, 2010 at 19:53, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:
>>
>> I am running a linear problem discretized with FEM on a diffusion
>> reaction system, with discontinuous source distribution. When I run
>> FGMRes with geometric multigrid as its preconditioner, I notice that
>> every time after the restart in fgmres, the new residual is orders of
>> magnitude higher than the previous iteration. I might be wrong on this
>> but should the restart not preserve monotonicity in convergence ? Or
>> am I thinking of a different variant of Gmres here.
>
> It is not possible to guarantee monotonicity for nonsymmetric matrices
> without storing the full subspace. ?There is no variant of GMRES, or any
> Krylov method for that matter, that can do what you want. ?You are seeing a
> particularly large jump, if you actually have a linear preconditioner (if
> you don't use Krylov cycles inside your smoothers) then you might try using
> bcgs or some variant thereof which would avoid the high cost of restart. ?Or
> you could stop using restarts, it looks like you were getting close to an
> adequate tolerance. ?Or find a way to make the preconditioner strong enough
> to converge in a reasonable number of iterations.

From bsmith at mcs.anl.gov  Tue Dec 21 14:23:16 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 21 Dec 2010 14:23:16 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
Message-ID: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>


On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:

> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
> This is a sign that the preconditioner is seriously messed up and should not be used in its current form.  It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
> 
> FGMRES only does right preconditioning so it should be showing the true residual.

  No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".

   Barry


From vijay.m at gmail.com  Tue Dec 21 14:26:40 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 21 Dec 2010 14:26:40 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
Message-ID: <AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>

Barry, I tried with the true_residual_norm option and it gives me the
exact same convergence as the one I have shown before.

 45 KSP Residual norm 2.511364148934e-07
   45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
 46 KSP Residual norm 1.307034672896e-07
   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
 47 KSP Residual norm 7.105770015635e-08
   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
 48 KSP Residual norm 4.098578230710e-08
   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
 49 KSP Residual norm 2.426160176080e-08
   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
 50 KSP Residual norm 1.864914790828e+02
   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
 51 KSP Residual norm 6.741080961009e+01
   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
 52 KSP Residual norm 5.191621875736e+01
   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
 53 KSP Residual norm 4.513782866249e+01
   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
   54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm
3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01


Vijay

On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:
>
>> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. ?It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
>>
>> FGMRES only does right preconditioning so it should be showing the true residual.
>
> ?No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".
>
> ? Barry
>
>
>

From jed at 59A2.org  Tue Dec 21 14:28:14 2010
From: jed at 59A2.org (Jed Brown)
Date: Tue, 21 Dec 2010 21:28:14 +0100
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTind4tm_ujFSzHJ+5rr0WrzSJUR2GWN+u=02m98+@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<AANLkTi=jF2R2vGhoksiY6kXMUtnkzW-E+6inqxk88q3E@mail.gmail.com>
	<AANLkTind4tm_ujFSzHJ+5rr0WrzSJUR2GWN+u=02m98+@mail.gmail.com>
Message-ID: <AANLkTikSn5_1nVanOaY16oTtePM6LyHMmG+JsVgXKMxq@mail.gmail.com>

On Tue, Dec 21, 2010 at 21:16, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> Jed, I ask because after the restart, the residual changes 10 orders
> of magnitude and a-priori, it is quite hard to decide the restart
> number. Yes in the test case I presented, the residual gets close
> enough to the tolerance  and I can afford few more vector storage but
> for a much refined problem, this might not be the case and so it
> worries me.
>

What happens if you run with -ksp_gmres_modifiedgramschmidt?  This is slow
in parallel, but provides insight into what is causing the problem.

My initial tests with bcgs were not satisfactory (very bad convergence
> as compared to gmres) but I tried GCR just now and it seems to
> converge correctly to the right solution, monotonically for the same
> problem.
>

GCR provides a cheap way to access the solution, see what it does with
monitor_true_residual.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101221/17357d8e/attachment-0001.htm>

From dave.mayhem23 at gmail.com  Tue Dec 21 14:28:34 2010
From: dave.mayhem23 at gmail.com (Dave May)
Date: Tue, 21 Dec 2010 12:28:34 -0800
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
Message-ID: <AANLkTikYDHaYTR2gdmadxhiKxPX0uvKgE5ovXMP=yD36@mail.gmail.com>

Vijay,
  You should definitely follow Barry's suggestion and monitor the true
residual (using -ksp_monitor_true_residual).
Jed, I know this may sound odd given FGMRES uses right
preconditioning, but I've seen that when the system is badly scaled,
the preconditioned residual and the true residual reported by
-ksp_monitor_true_residual can drift from one another. In such
situations with the reported numbers are initially the same, but after
a number of iterations, the preconditioned residual may continue to
decrease but the true residual actually stagnates.

Cheers,
  Dave


On 21 December 2010 12:08, Jed Brown <jed at 59a2.org> wrote:
> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>> This is a sign that the preconditioner is seriously messed up and should
>> not be used in its current form. ?It can happen if the matrix is nearly
>> singular and for example you use an incomplete factorization for a
>> preconditioner that just screws up the scaling like totally. Run with
>> -ksp_monitor_true_residual and you'll see that the solver is not really
>> solving the problem even though it thinks it is converging fine.
>
> FGMRES only does right preconditioning so it should be showing the true
> residual.

From jed at 59A2.org  Tue Dec 21 14:29:31 2010
From: jed at 59A2.org (Jed Brown)
Date: Tue, 21 Dec 2010 21:29:31 +0100
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
	<AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
Message-ID: <AANLkTi=HdDsPWGb4Z-q_8XeQnPuHcgQMySt6GAYtiOyB@mail.gmail.com>

On Tue, Dec 21, 2010 at 21:26, Vijay S. Mahadevan <vijay.m at gmail.com> wrote:

> 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
>

The true residual is huge, this is not converging.  You probably have a
singular preconditioner.  What are you using (-ksp_view) and what system are
you solving with what discretization.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101221/2f4d7c3a/attachment.htm>

From bsmith at mcs.anl.gov  Tue Dec 21 14:30:25 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 21 Dec 2010 14:30:25 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
	<AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
Message-ID: <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov>


  Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.

  Barry

On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote:

> Barry, I tried with the true_residual_norm option and it gives me the
> exact same convergence as the one I have shown before.
> 
> 45 KSP Residual norm 2.511364148934e-07
>   45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
>   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
> 46 KSP Residual norm 1.307034672896e-07
>   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
> 47 KSP Residual norm 7.105770015635e-08
>   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
> 48 KSP Residual norm 4.098578230710e-08
>   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
> 49 KSP Residual norm 2.426160176080e-08
>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
> 50 KSP Residual norm 1.864914790828e+02
>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
> 51 KSP Residual norm 6.741080961009e+01
>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
> 52 KSP Residual norm 5.191621875736e+01
>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
> 53 KSP Residual norm 4.513782866249e+01
>   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>   54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm
> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01
> 
> 
> Vijay
> 
> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:
>> 
>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form.  It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
>>> 
>>> FGMRES only does right preconditioning so it should be showing the true residual.
>> 
>>  No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".
>> 
>>   Barry
>> 
>> 
>> 


From vijay.m at gmail.com  Tue Dec 21 14:46:20 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 21 Dec 2010 14:46:20 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
	<AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
	<54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov>
Message-ID: <AANLkTi=skm_J4WTLuBwdc3xCtTPE0hQnFujeZmduJPxt@mail.gmail.com>

>  Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
Ah yes. I was reading the output wrongly. Thanks for pointing that
out. So then it is quite possible that my preconditioner is terrible
for this problem.

Curiously with GCR, the true residual does converge.

 62 KSP Residual norm 6.845396874593e-10
   62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm
6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00
   63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
 63 KSP Residual norm 4.617426258215e-10
   63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
   64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
 64 KSP Residual norm 3.659090331422e-10
   64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
   65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
 65 KSP Residual norm 2.457005532004e-10
   65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
   66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
 66 KSP Residual norm 1.765446010945e-10
   66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01

Jed, with modified gram schmidt procedure, fgmres yields the
following, which looks like the same as before:

 49 KSP Residual norm 2.426160176080e-08
   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
 50 KSP Residual norm 1.864914790828e+02
   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
 51 KSP Residual norm 6.741080961009e+01
   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01

But I generally see that the true residual of GCR seems to converge to
desired tolerance but for GMRES, the convergence stagnates with
different options on my MG preconditioner. This is puzzling to me
since I spent enough time making sure that the preconditioner was
working correctly but I will look more into this now. Thanks for all
the helpful comments guys ! I will post here if I find any other
curious behavior.

Vijay

On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
>
> ?Barry
>
> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote:
>
>> Barry, I tried with the true_residual_norm option and it gives me the
>> exact same convergence as the one I have shown before.
>>
>> 45 KSP Residual norm 2.511364148934e-07
>> ? 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
>> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>> 46 KSP Residual norm 1.307034672896e-07
>> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>> 47 KSP Residual norm 7.105770015635e-08
>> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>> 48 KSP Residual norm 4.098578230710e-08
>> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>> 49 KSP Residual norm 2.426160176080e-08
>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>> 50 KSP Residual norm 1.864914790828e+02
>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>> 51 KSP Residual norm 6.741080961009e+01
>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>> 52 KSP Residual norm 5.191621875736e+01
>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>> 53 KSP Residual norm 4.513782866249e+01
>> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>> ? 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm
>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01
>>
>>
>> Vijay
>>
>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:
>>>
>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. ?It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
>>>>
>>>> FGMRES only does right preconditioning so it should be showing the true residual.
>>>
>>> ?No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".
>>>
>>> ? Barry
>>>
>>>
>>>
>
>

From bsmith at mcs.anl.gov  Tue Dec 21 14:52:20 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 21 Dec 2010 14:52:20 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTi=skm_J4WTLuBwdc3xCtTPE0hQnFujeZmduJPxt@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
	<AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
	<54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov>
	<AANLkTi=skm_J4WTLuBwdc3xCtTPE0hQnFujeZmduJPxt@mail.gmail.com>
Message-ID: <BD143D11-FFE1-4921-A351-26AFCE9C2360@mcs.anl.gov>


  The GCR algorithm computes the residual and hence the residual norm EXPLICITLY as part of the solution process, it does not use the recursive formula that FGMES uses. I "think" the use of the recursive formula is why FGMRES is cheaper than GCR (and hence much more commonly used). 

   Barry


On Dec 21, 2010, at 2:46 PM, Vijay S. Mahadevan wrote:

>> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
> Ah yes. I was reading the output wrongly. Thanks for pointing that
> out. So then it is quite possible that my preconditioner is terrible
> for this problem.
> 
> Curiously with GCR, the true residual does converge.
> 
> 62 KSP Residual norm 6.845396874593e-10
>   62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm
> 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00
>   63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
> 63 KSP Residual norm 4.617426258215e-10
>   63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
>   64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
> 64 KSP Residual norm 3.659090331422e-10
>   64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
>   65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
> 65 KSP Residual norm 2.457005532004e-10
>   65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
>   66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
> 66 KSP Residual norm 1.765446010945e-10
>   66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
> 
> Jed, with modified gram schmidt procedure, fgmres yields the
> following, which looks like the same as before:
> 
> 49 KSP Residual norm 2.426160176080e-08
>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
> 50 KSP Residual norm 1.864914790828e+02
>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
> 51 KSP Residual norm 6.741080961009e+01
>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
> 
> But I generally see that the true residual of GCR seems to converge to
> desired tolerance but for GMRES, the convergence stagnates with
> different options on my MG preconditioner. This is puzzling to me
> since I spent enough time making sure that the preconditioner was
> working correctly but I will look more into this now. Thanks for all
> the helpful comments guys ! I will post here if I find any other
> curious behavior.
> 
> Vijay
> 
> On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
>> 
>>  Barry
>> 
>> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote:
>> 
>>> Barry, I tried with the true_residual_norm option and it gives me the
>>> exact same convergence as the one I have shown before.
>>> 
>>> 45 KSP Residual norm 2.511364148934e-07
>>>   45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
>>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
>>>   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>>> 46 KSP Residual norm 1.307034672896e-07
>>>   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>>>   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>>> 47 KSP Residual norm 7.105770015635e-08
>>>   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>>>   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>>> 48 KSP Residual norm 4.098578230710e-08
>>>   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>>>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>> 49 KSP Residual norm 2.426160176080e-08
>>>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>> 50 KSP Residual norm 1.864914790828e+02
>>>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>> 51 KSP Residual norm 6.741080961009e+01
>>>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>> 52 KSP Residual norm 5.191621875736e+01
>>>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>>   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>>> 53 KSP Residual norm 4.513782866249e+01
>>>   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>>>   54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm
>>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01
>>> 
>>> 
>>> Vijay
>>> 
>>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:
>>>> 
>>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form.  It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
>>>>> 
>>>>> FGMRES only does right preconditioning so it should be showing the true residual.
>>>> 
>>>>  No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> 
>> 
>> 


From vijay.m at gmail.com  Tue Dec 21 15:16:37 2010
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Tue, 21 Dec 2010 15:16:37 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <BD143D11-FFE1-4921-A351-26AFCE9C2360@mcs.anl.gov>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
	<AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
	<54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov>
	<AANLkTi=skm_J4WTLuBwdc3xCtTPE0hQnFujeZmduJPxt@mail.gmail.com>
	<BD143D11-FFE1-4921-A351-26AFCE9C2360@mcs.anl.gov>
Message-ID: <AANLkTi=AChcf6n_B8whLkijmo1_NkJzFg_5x=b+Z6WMT@mail.gmail.com>

Also GCR seems to use and allocate comparatively more vectors,
translating to lot more memory. This does make FGMRES more attractive.
I will look at the preconditioner and try to find the true cause of
the issue.

Cheers,
Vijay

On Tue, Dec 21, 2010 at 2:52 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?The GCR algorithm computes the residual and hence the residual norm EXPLICITLY as part of the solution process, it does not use the recursive formula that FGMES uses. I "think" the use of the recursive formula is why FGMRES is cheaper than GCR (and hence much more commonly used).
>
> ? Barry
>
>
> On Dec 21, 2010, at 2:46 PM, Vijay S. Mahadevan wrote:
>
>>> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
>> Ah yes. I was reading the output wrongly. Thanks for pointing that
>> out. So then it is quite possible that my preconditioner is terrible
>> for this problem.
>>
>> Curiously with GCR, the true residual does converge.
>>
>> 62 KSP Residual norm 6.845396874593e-10
>> ? 62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm
>> 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00
>> ? 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
>> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
>> 63 KSP Residual norm 4.617426258215e-10
>> ? 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
>> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
>> ? 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
>> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
>> 64 KSP Residual norm 3.659090331422e-10
>> ? 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
>> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
>> ? 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
>> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
>> 65 KSP Residual norm 2.457005532004e-10
>> ? 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
>> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
>> ? 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
>> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
>> 66 KSP Residual norm 1.765446010945e-10
>> ? 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
>> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
>>
>> Jed, with modified gram schmidt procedure, fgmres yields the
>> following, which looks like the same as before:
>>
>> 49 KSP Residual norm 2.426160176080e-08
>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>> 50 KSP Residual norm 1.864914790828e+02
>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>> 51 KSP Residual norm 6.741080961009e+01
>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>
>> But I generally see that the true residual of GCR seems to converge to
>> desired tolerance but for GMRES, the convergence stagnates with
>> different options on my MG preconditioner. This is puzzling to me
>> since I spent enough time making sure that the preconditioner was
>> working correctly but I will look more into this now. Thanks for all
>> the helpful comments guys ! I will post here if I find any other
>> curious behavior.
>>
>> Vijay
>>
>> On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> ?Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
>>>
>>> ?Barry
>>>
>>> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Barry, I tried with the true_residual_norm option and it gives me the
>>>> exact same convergence as the one I have shown before.
>>>>
>>>> 45 KSP Residual norm 2.511364148934e-07
>>>> ? 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
>>>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
>>>> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>>>> 46 KSP Residual norm 1.307034672896e-07
>>>> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>>>> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>>>> 47 KSP Residual norm 7.105770015635e-08
>>>> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>>>> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>>>> 48 KSP Residual norm 4.098578230710e-08
>>>> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>>>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>>> 49 KSP Residual norm 2.426160176080e-08
>>>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>>> 50 KSP Residual norm 1.864914790828e+02
>>>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>>> 51 KSP Residual norm 6.741080961009e+01
>>>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>>> 52 KSP Residual norm 5.191621875736e+01
>>>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>>> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>>>> 53 KSP Residual norm 4.513782866249e+01
>>>> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>>>> ? 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm
>>>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01
>>>>
>>>>
>>>> Vijay
>>>>
>>>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:
>>>>>
>>>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. ?It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
>>>>>>
>>>>>> FGMRES only does right preconditioning so it should be showing the true residual.
>>>>>
>>>>> ?No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".
>>>>>
>>>>> ? Barry
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

From bsmith at mcs.anl.gov  Tue Dec 21 15:34:30 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 21 Dec 2010 15:34:30 -0600
Subject: [petsc-users] Monotonic convergence in FGMRES.
In-Reply-To: <AANLkTi=AChcf6n_B8whLkijmo1_NkJzFg_5x=b+Z6WMT@mail.gmail.com>
References: <AANLkTinZc7FDFfZHB_XNfr87k_1Yc-ft6pWKTGg9Yqnc@mail.gmail.com>
	<B80895A8-C9AE-41C1-A8F5-BEF330D4F814@mcs.anl.gov>
	<AANLkTimmZ-TfQuGPccUgmpXWpmj5YcnEMZf46x7f0v6Z@mail.gmail.com>
	<749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov>
	<AANLkTi=5mpaK42uDqVtK3XY41rF0+t2-FUc2PT1cg9gY@mail.gmail.com>
	<54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov>
	<AANLkTi=skm_J4WTLuBwdc3xCtTPE0hQnFujeZmduJPxt@mail.gmail.com>
	<BD143D11-FFE1-4921-A351-26AFCE9C2360@mcs.anl.gov>
	<AANLkTi=AChcf6n_B8whLkijmo1_NkJzFg_5x=b+Z6WMT@mail.gmail.com>
Message-ID: <9ABA7B64-06BA-4025-9D57-FA25161929A3@mcs.anl.gov>


On Dec 21, 2010, at 3:16 PM, Vijay S. Mahadevan wrote:

> Also GCR seems to use and allocate comparatively more vectors,
> translating to lot more memory.

  Yes

> This does make FGMRES more attractive.
> I will look at the preconditioner and try to find the true cause of
> the issue.
> 
> Cheers,
> Vijay
> 
> On Tue, Dec 21, 2010 at 2:52 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  The GCR algorithm computes the residual and hence the residual norm EXPLICITLY as part of the solution process, it does not use the recursive formula that FGMES uses. I "think" the use of the recursive formula is why FGMRES is cheaper than GCR (and hence much more commonly used).
>> 
>>   Barry
>> 
>> 
>> On Dec 21, 2010, at 2:46 PM, Vijay S. Mahadevan wrote:
>> 
>>>> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
>>> Ah yes. I was reading the output wrongly. Thanks for pointing that
>>> out. So then it is quite possible that my preconditioner is terrible
>>> for this problem.
>>> 
>>> Curiously with GCR, the true residual does converge.
>>> 
>>> 62 KSP Residual norm 6.845396874593e-10
>>>   62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm
>>> 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00
>>>   63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
>>> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
>>> 63 KSP Residual norm 4.617426258215e-10
>>>   63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm
>>> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01
>>>   64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
>>> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
>>> 64 KSP Residual norm 3.659090331422e-10
>>>   64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm
>>> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00
>>>   65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
>>> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
>>> 65 KSP Residual norm 2.457005532004e-10
>>>   65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm
>>> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01
>>>   66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
>>> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
>>> 66 KSP Residual norm 1.765446010945e-10
>>>   66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm
>>> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01
>>> 
>>> Jed, with modified gram schmidt procedure, fgmres yields the
>>> following, which looks like the same as before:
>>> 
>>> 49 KSP Residual norm 2.426160176080e-08
>>>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>> 50 KSP Residual norm 1.864914790828e+02
>>>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>> 51 KSP Residual norm 6.741080961009e+01
>>>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>> 
>>> But I generally see that the true residual of GCR seems to converge to
>>> desired tolerance but for GMRES, the convergence stagnates with
>>> different options on my MG preconditioner. This is puzzling to me
>>> since I spent enough time making sure that the preconditioner was
>>> working correctly but I will look more into this now. Thanks for all
>>> the helpful comments guys ! I will post here if I find any other
>>> curious behavior.
>>> 
>>> Vijay
>>> 
>>> On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>  Yes but look at the true residual norm it is huge and indicates the residual is not really getting small.
>>>> 
>>>>  Barry
>>>> 
>>>> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote:
>>>> 
>>>>> Barry, I tried with the true_residual_norm option and it gives me the
>>>>> exact same convergence as the one I have shown before.
>>>>> 
>>>>> 45 KSP Residual norm 2.511364148934e-07
>>>>>   45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm
>>>>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02
>>>>>   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>>>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>>>>> 46 KSP Residual norm 1.307034672896e-07
>>>>>   46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm
>>>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02
>>>>>   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>>>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>>>>> 47 KSP Residual norm 7.105770015635e-08
>>>>>   47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm
>>>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02
>>>>>   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>>>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>>>>> 48 KSP Residual norm 4.098578230710e-08
>>>>>   48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm
>>>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02
>>>>>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>>>> 49 KSP Residual norm 2.426160176080e-08
>>>>>   49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm
>>>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02
>>>>>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>>>> 50 KSP Residual norm 1.864914790828e+02
>>>>>   50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm
>>>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02
>>>>>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>>>> 51 KSP Residual norm 6.741080961009e+01
>>>>>   51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm
>>>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02
>>>>>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>>>> 52 KSP Residual norm 5.191621875736e+01
>>>>>   52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm
>>>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01
>>>>>   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>>>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>>>>> 53 KSP Residual norm 4.513782866249e+01
>>>>>   53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm
>>>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01
>>>>>   54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm
>>>>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01
>>>>> 
>>>>> 
>>>>> Vijay
>>>>> 
>>>>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>> 
>>>>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote:
>>>>>> 
>>>>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form.  It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine.
>>>>>>> 
>>>>>>> FGMRES only does right preconditioning so it should be showing the true residual.
>>>>>> 
>>>>>>  No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage".
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


From gaurish108 at gmail.com  Tue Dec 21 22:03:30 2010
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Tue, 21 Dec 2010 23:03:30 -0500
Subject: [petsc-users] Reading matrices into PETSc
Message-ID: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>

i have this large text file containing a matrix.
  This text file contains the non-zero entries of a very large sparse matrix

The first two columns indicate the position of the non-zero entry
  and the last column the actual non-zero value it self

for
example the matrix
  1 0 8
  0 0 5
  6 0 0
  is written in the text file in the form of
  1 1 1
  1 3 8
 2 3 5
  3 1 6

This is the standard  [(row ,column),  non-zero] entry format.  i want PETSc
to load this matrix
  from the text file
  i am not sure how
  to do that. What commands do I use?

I am new to PETSc, so some detail in the explanation will be really helpful.

Sincere thanks,

Gaurish.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101221/e9290949/attachment.htm>

From u.tabak at tudelft.nl  Tue Dec 21 22:09:22 2010
From: u.tabak at tudelft.nl (Umut Tabak)
Date: Wed, 22 Dec 2010 05:09:22 +0100
Subject: [petsc-users] Reading matrices into PETSc
In-Reply-To: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
References: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
Message-ID: <4D1179F2.7020306@tudelft.nl>

Gaurish Telang wrote:
> i have this large text file containing a matrix.
>   This text file contains the non-zero entries of a very large sparse 
> matrix
>  
Read the documentation on MatSetValues function. It is a starting point.

From bsmith at mcs.anl.gov  Tue Dec 21 22:16:47 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 21 Dec 2010 22:16:47 -0600
Subject: [petsc-users] Reading matrices into PETSc
In-Reply-To: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
References: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
Message-ID: <0B40812B-4A1A-4021-A5D0-1EA21D2F66AC@mcs.anl.gov>


 http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#sparse-matrix-ascii-format

   Barry

On Dec 21, 2010, at 10:03 PM, Gaurish Telang wrote:

> i have this large text file containing a matrix. 
>  
> This text file contains the non-zero entries of a very large sparse matrix
>  
> 
> The first two columns indicate the position of the non-zero entry
>   
> and the last column the actual non-zero value it self
>  
> 
> for 
> example the matrix 
>   
> 1 0 8
>   
> 0 0 5
>   
> 6 0 0
>   
> is written in the text file in the form of
>   
> 1 1 1
>   
> 1 3 8
>  
> 2 3 5
>   
> 3 1 6
> 
> This is the standard  [(row ,column),  non-zero] entry format. 
> i want PETSc to load this matrix
>   
> from the text file
>   
> i am not sure how
>   
> to do that. What commands do I use?  
> 
> I am new to PETSc, so some detail in the explanation will be really helpful.
> 
> Sincere thanks,
> 
> Gaurish.  


From gaurish108 at gmail.com  Wed Dec 22 01:56:33 2010
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Wed, 22 Dec 2010 02:56:33 -0500
Subject: [petsc-users] Reading matrices into PETSc
In-Reply-To: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
References: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
Message-ID: <AANLkTim1DyjeFONf=g6b=b6EMoYp-3L-5hsTV3ycpsP_@mail.gmail.com>

I am sorry, but I am not yet clear on how to do this. I read ex32.c and
ex72.c but I am still confused. What is ASCII 'slap' format? How should
matrix be supplied to PETSc?

My matrix is a 2000x1900 matrix given in the format MATLAB stores sparse
matrices. i.e    [row, column, non-zero-entry] format.


On Tue, Dec 21, 2010 at 11:03 PM, Gaurish Telang <gaurish108 at gmail.com>wrote:

> i have this large text file containing a matrix.
>   This text file contains the non-zero entries of a very large sparse
> matrix
>
> The first two columns indicate the position of the non-zero entry
>   and the last column the actual non-zero value it self
>
> for
> example the matrix
>   1 0 8
>   0 0 5
>   6 0 0
>   is written in the text file in the form of
>   1 1 1
>   1 3 8
>  2 3 5
>   3 1 6
>
> This is the standard  [(row ,column),  non-zero] entry format.  i want
> PETSc to load this matrix
>   from the text file
>   i am not sure how
>   to do that. What commands do I use?
>
> I am new to PETSc, so some detail in the explanation will be really
> helpful.
>
> Sincere thanks,
>
> Gaurish.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/c6f2bd2e/attachment.htm>

From abhyshr at mcs.anl.gov  Wed Dec 22 03:41:21 2010
From: abhyshr at mcs.anl.gov (Shri)
Date: Wed, 22 Dec 2010 03:41:21 -0600 (CST)
Subject: [petsc-users] Reading matrices into PETSc
In-Reply-To: <AANLkTim1DyjeFONf=g6b=b6EMoYp-3L-5hsTV3ycpsP_@mail.gmail.com>
Message-ID: <1372925860.24678.1293010881196.JavaMail.root@zimbra.anl.gov>

Find attached a routine which reads matrix data from an ASCII file in i j value format and creates a seqaij matrix. 

----- Original Message -----


I am sorry, but I am not yet clear on how to do this. I read ex32.c and ex72.c but I am still confused. What is ASCII 'slap' format? How should matrix be supplied to PETSc? 

My matrix is a 2000x1900 matrix given in the format MATLAB stores sparse matrices. i.e [row, column, non-zero-entry] format. 


On Tue, Dec 21, 2010 at 11:03 PM, Gaurish Telang < gaurish108 at gmail.com > wrote: 


i have this large text file containing a matrix. 

This text file contains the non-zero entries of a very large sparse matrix 

The first two columns indicate the position of the non-zero entry 
and the last column the actual non-zero value it self 

for 

example the matrix 

1 0 8 
0 0 5 
6 0 0 
is written in the text file in the form of 
1 1 1 
1 3 8 
2 3 5 
3 1 6 

This is the standard [(row ,column), non-zero] entry format. i want PETSc to load this matrix 
from the text file 
i am not sure how 
to do that. What commands do I use? 

I am new to PETSc, so some detail in the explanation will be really helpful. 

Sincere thanks, 

Gaurish. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/0ddb21ee/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ReadMatFromFile.c
Type: text/x-csrc
Size: 2125 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/0ddb21ee/attachment-0001.c>

From thomas.witkowski at tu-dresden.de  Wed Dec 22 03:07:22 2010
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Wed, 22 Dec 2010 10:07:22 +0100
Subject: [petsc-users] ParMETIS question
In-Reply-To: <AANLkTin-SPMjsT4uB4Ms7wprsAXX9OGy+nPggstTf-nn@mail.gmail.com>
References: <4D10B086.2090503@tu-dresden.de>
	<AANLkTin-SPMjsT4uB4Ms7wprsAXX9OGy+nPggstTf-nn@mail.gmail.com>
Message-ID: <4D11BFCA.2030800@tu-dresden.de>

Okay, in my computations, I have empty partitions on some ranks and 
definitely not
minimal boundary sizes. So may be I generate a wrong input. But if this 
is the case, I
wonder why the resulting mesh partitioning is quite good. If I neglect 
the problem of
empty partitions, the redistributed mesh leads to a very good load 
balancing. Is there
any meaningful way to debug the problem? Is there something link a 
"verbose mode" in
ParMetis that says me whats happen on the input data? Otherwise I have 
to print all the
input data to the screen and check it by hand. Although I have a quite 
small example with
128 overall coarse mesh elements on 8 ranks, this is not big fun :)

Thomas

@Matthew: By mistake I've answered your mail directly to you and not to 
the mailing list, therefore I sent it now here again

Matthew Knepley wrote:
> On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski 
> <thomas.witkowski at tu-dresden.de 
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
>     Hi,
>
>     I have a not directly PETSc related question, but I hope to get
>     some answer from the community here. In my FEM code, I make use of
>     ParMETIS to partition the mesh. I make direct use of this library
>     and not of PETSc's ParMETIS integration. The initial partition is
>     always fine, but I use the ParMETIS_V3_AdaptiveRepart function for
>     repartition the mesh due to local mesh adaption. In most cases,
>     the result is fine, but there are two points, where I have trouble
>     with:
>
>     1) Sometimes ParMETIS generates empty partitions, i.e., a
>     processor has zero mesh elements. This is something my code cannot
>     handle. Is this a bug or a feature? If it is a feature, is there
>     any possiblity to disable it?
>
>
> ParMetis has a balance constraint if you weight vertices. This will 
> enforce equal size partitions.
>  
>
>     2) In most cases the specific partitions are not connected. If I
>     put all data to ParMETIS in a correct way, is this okay? My code
>     can handle it, but is slows down the computation due to larger
>     interior boundaries and therefore to more communications.
>
>
> ParMetis minimizes the overall boundary size, so I do not understand 
> how you could see this slowdown.
>
>    Matt
>  
>
>     Does anyone of you know an answer to these question? Is there a
>     debug mode in ParMETIS, where I can see which data is set to its
>     function calls?
>
>     Regards,
>
>     Thomas
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener


From thomas.witkowski at tu-dresden.de  Wed Dec 22 03:12:34 2010
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Wed, 22 Dec 2010 10:12:34 +0100
Subject: [petsc-users] ParMETIS question
In-Reply-To: <AANLkTinJoA1TbvFoLSUE+r7OUYNE40JJNfarQyO1NBTQ@mail.gmail.com>
References: <4D10B086.2090503@tu-dresden.de>	<AANLkTin-SPMjsT4uB4Ms7wprsAXX9OGy+nPggstTf-nn@mail.gmail.com>	<20101221205356.xt2c3cnyockkck88@mail.zih.tu-dresden.de>
	<AANLkTinJoA1TbvFoLSUE+r7OUYNE40JJNfarQyO1NBTQ@mail.gmail.com>
Message-ID: <4D11C102.8020605@tu-dresden.de>

Matthew Knepley wrote:
> On Tue, Dec 21, 2010 at 11:53 AM, Thomas Witkowski 
> <Thomas.Witkowski at tu-dresden.de 
> <mailto:Thomas.Witkowski at tu-dresden.de>> wrote:
>
>     Okay, in my computations, I have empty partitions on some ranks
>     and definitely not minimal boundary sizes. So may be I generate a
>     wrong input. But if this is the case, I wonder why the resulting
>     mesh partitioning is quite good. If I neglect the problem of empty
>
>
> The above statement does not make any sense. You can get perfect load 
> balancing by just chopping the mesh into
> equal parts. You only care about using a mesh partitioner if you want 
> to minimize the cut size (boundary length, communication,
> etc.)
My situation is slightly different. I make the partitioning and 
distribution of the mesh on a coarse level. The coarser elements may be 
further adapted. For ParMetis, I use the number of fine elements in the 
coarse mesh elements to weight them. Therefore, I do not get equal parts 
with respect to the number of coarse mesh elements. But what do not want 
to have are empty partitions. And in my test case with 128 coarse mesh 
elements and 8 processes, I get using either ParMETIS_V3_PartMeshKway or 
ParMETIS_V3_AdaptiveRepart two empty partitions. I wrote a function to 
print the dual graph of the mesh, and it looks fine.

Thomas

>    Matt
>  
>
>     partitions, the redistributed mesh leads to a very good load
>     balancing. Is there any meaningful way to debug the problem? Is
>     there something link a "verbose mode" in ParMetis that says me
>     whats happen on the input data? Otherwise I have to print all the
>     input data to the screen and check it by hand. Although I have a
>     quite small example with 128 overall coarse mesh elements on 8
>     ranks, this is not big fun :)
>
>     Thomas
>
>     Zitat von Matthew Knepley <knepley at gmail.com
>     <mailto:knepley at gmail.com>>:
>
>
>         On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski <
>         thomas.witkowski at tu-dresden.de
>         <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
>             Hi,
>
>             I have a not directly PETSc related question, but I hope
>             to get some answer
>             from the community here. In my FEM code, I make use of
>             ParMETIS to partition
>             the mesh. I make direct use of this library and not of
>             PETSc's ParMETIS
>             integration. The initial partition is always fine, but I
>             use the
>             ParMETIS_V3_AdaptiveRepart function for repartition the
>             mesh due to local
>             mesh adaption. In most cases, the result is fine, but
>             there are two points,
>             where I have trouble with:
>
>             1) Sometimes ParMETIS generates empty partitions, i.e., a
>             processor has
>             zero mesh elements. This is something my code cannot
>             handle. Is this a bug
>             or a feature? If it is a feature, is there any possiblity
>             to disable it?
>
>
>         ParMetis has a balance constraint if you weight vertices. This
>         will enforce
>         equal size partitions.
>
>
>             2) In most cases the specific partitions are not
>             connected. If I put all
>             data to ParMETIS in a correct way, is this okay? My code
>             can handle it, but
>             is slows down the computation due to larger interior
>             boundaries and
>             therefore to more communications.
>
>
>         ParMetis minimizes the overall boundary size, so I do not
>         understand how you
>         could see this slowdown.
>
>           Matt
>
>
>             Does anyone of you know an answer to these question? Is
>             there a debug mode
>             in ParMETIS, where I can see which data is set to its
>             function calls?
>
>             Regards,
>
>             Thomas
>
>
>
>
>         --
>         What most experimenters take for granted before they begin
>         their experiments
>         is infinitely more interesting than any results to which their
>         experiments
>         lead.
>         -- Norbert Wiener
>
>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener


From u.tabak at tudelft.nl  Wed Dec 22 03:32:57 2010
From: u.tabak at tudelft.nl (Umut Tabak)
Date: Wed, 22 Dec 2010 10:32:57 +0100
Subject: [petsc-users] Reading matrices into PETSc
In-Reply-To: <AANLkTim1DyjeFONf=g6b=b6EMoYp-3L-5hsTV3ycpsP_@mail.gmail.com>
References: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
	<AANLkTim1DyjeFONf=g6b=b6EMoYp-3L-5hsTV3ycpsP_@mail.gmail.com>
Message-ID: <4D11C5C9.1050805@tudelft.nl>

On 12/22/2010 08:56 AM, Gaurish Telang wrote:
> but I am not yet clear on how to do this. I read ex32.c and ex72.c but 
> I am still confused. What is ASCII 'slap' format? How should  matrix 
> be supplied to PETSc?
>
> My matrix is a 2000x1900 matrix given in the format MATLAB stores 
> sparse matrices. i.e
What exactly do you want to do, here is a simple example for Compressed 
Row Storage (CSR) format which is the default format in PETSc as far as 
I know. You can use the MatSetValues function to set the nonzeros, Have 
you tried to read the documentation about the matrices in the user manual?

Matlab pseudo code
A = diag([1 1 1])

So table is sth like this
  i  j  val
  0 0 1
  1 1 1
  2 2 1

see this

Mat A; // of course, A should be created first and assembled after the 
set operation
int m =3;
int n = 3;
PetscInt indxm [] ={0, 1, 2};
PetscInt indxn [] = {0, 1, 2};
PetscScalar vals[] = {1.,1.,1.}
MatSetValues(A, m, idxm, n, idxn, vals, INSERT_VALUES);
// assemble after

HTH,
Umut

-- 
  - Hope is a good thing, maybe the best of things
    and no good thing ever dies...
  The Shawshank Redemption, replique of Tim Robbins


From thomas.witkowski at tu-dresden.de  Wed Dec 22 08:01:13 2010
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Wed, 22 Dec 2010 15:01:13 +0100
Subject: [petsc-users] ParMETIS question
In-Reply-To: <4D11BFCA.2030800@tu-dresden.de>
References: <4D10B086.2090503@tu-dresden.de>	<AANLkTin-SPMjsT4uB4Ms7wprsAXX9OGy+nPggstTf-nn@mail.gmail.com>
	<4D11BFCA.2030800@tu-dresden.de>
Message-ID: <4D1204A9.7090209@tu-dresden.de>

So, I found the problem related to empty partitions. It is not possible 
to weight vertices (i.e. elements of a mesh) in such a way that one 
weight is much higher than the other ones. For more details see

http://glaros.dtc.umn.edu/flyspray/task/11

Its a pity that ParMetis makes is very hard to find this kind of errors.

The open question for me is about the non continuous partitions. Is it a 
normal behavior of ParMetis to create partitions that are not continous?

Thomas

Thomas Witkowski wrote:
> Okay, in my computations, I have empty partitions on some ranks and 
> definitely not
> minimal boundary sizes. So may be I generate a wrong input. But if 
> this is the case, I
> wonder why the resulting mesh partitioning is quite good. If I neglect 
> the problem of
> empty partitions, the redistributed mesh leads to a very good load 
> balancing. Is there
> any meaningful way to debug the problem? Is there something link a 
> "verbose mode" in
> ParMetis that says me whats happen on the input data? Otherwise I have 
> to print all the
> input data to the screen and check it by hand. Although I have a quite 
> small example with
> 128 overall coarse mesh elements on 8 ranks, this is not big fun :)
>
> Thomas
>
> @Matthew: By mistake I've answered your mail directly to you and not 
> to the mailing list, therefore I sent it now here again
>
> Matthew Knepley wrote:
>> On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski 
>> <thomas.witkowski at tu-dresden.de 
>> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>>
>>     Hi,
>>
>>     I have a not directly PETSc related question, but I hope to get
>>     some answer from the community here. In my FEM code, I make use of
>>     ParMETIS to partition the mesh. I make direct use of this library
>>     and not of PETSc's ParMETIS integration. The initial partition is
>>     always fine, but I use the ParMETIS_V3_AdaptiveRepart function for
>>     repartition the mesh due to local mesh adaption. In most cases,
>>     the result is fine, but there are two points, where I have trouble
>>     with:
>>
>>     1) Sometimes ParMETIS generates empty partitions, i.e., a
>>     processor has zero mesh elements. This is something my code cannot
>>     handle. Is this a bug or a feature? If it is a feature, is there
>>     any possiblity to disable it?
>>
>>
>> ParMetis has a balance constraint if you weight vertices. This will 
>> enforce equal size partitions.
>>  
>>
>>     2) In most cases the specific partitions are not connected. If I
>>     put all data to ParMETIS in a correct way, is this okay? My code
>>     can handle it, but is slows down the computation due to larger
>>     interior boundaries and therefore to more communications.
>>
>>
>> ParMetis minimizes the overall boundary size, so I do not understand 
>> how you could see this slowdown.
>>
>>    Matt
>>  
>>
>>     Does anyone of you know an answer to these question? Is there a
>>     debug mode in ParMETIS, where I can see which data is set to its
>>     function calls?
>>
>>     Regards,
>>
>>     Thomas
>>
>>
>>
>>
>> -- 
>> What most experimenters take for granted before they begin their 
>> experiments is infinitely more interesting than any results to which 
>> their experiments lead.
>> -- Norbert Wiener
>
>
>


From bsmith at mcs.anl.gov  Wed Dec 22 08:09:33 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 22 Dec 2010 08:09:33 -0600
Subject: [petsc-users] Reading matrices into PETSc
In-Reply-To: <AANLkTim1DyjeFONf=g6b=b6EMoYp-3L-5hsTV3ycpsP_@mail.gmail.com>
References: <AANLkTim1F628X0v4y5b8PX98zO6R=QQX2WGALTDOV+-4@mail.gmail.com>
	<AANLkTim1DyjeFONf=g6b=b6EMoYp-3L-5hsTV3ycpsP_@mail.gmail.com>
Message-ID: <F5D3FEB6-AEA8-4398-A2C2-0697C7D41019@mcs.anl.gov>


On Dec 22, 2010, at 1:56 AM, Gaurish Telang wrote:

> I am sorry, but I am not yet clear on how to do this. I read ex32.c and ex72.c but I am still confused. What is ASCII 'slap' format? How should  matrix be supplied to PETSc? 

  You will need to modify one of those examples slightly to read in the exact format of the ASCII file that you have. The names of the various formats doesn't matter, you just need to match the reading in of the ASCII file to the exact format of your file.

   Barry

> 
> My matrix is a 2000x1900 matrix given in the format MATLAB stores sparse matrices. i.e    [row, column, non-zero-entry] format.
> 
> 
> 
> On Tue, Dec 21, 2010 at 11:03 PM, Gaurish Telang <gaurish108 at gmail.com> wrote:
> i have this large text file containing a matrix. 
>  
> This text file contains the non-zero entries of a very large sparse matrix
>  
> 
> The first two columns indicate the position of the non-zero entry
>   
> and the last column the actual non-zero value it self
>  
> 
> for 
> example the matrix 
>   
> 1 0 8
>   
> 0 0 5
>   
> 6 0 0
>   
> is written in the text file in the form of
>   
> 1 1 1
>   
> 1 3 8
>  
> 2 3 5
>   
> 3 1 6
> 
> This is the standard  [(row ,column),  non-zero] entry format. 
> i want PETSc to load this matrix
>   
> from the text file
>   
> i am not sure how
>   
> to do that. What commands do I use?  
> 
> I am new to PETSc, so some detail in the explanation will be really helpful.
> 
> Sincere thanks,
> 
> Gaurish.  
> 


From yjxd.chen at gmail.com  Wed Dec 22 09:55:23 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Wed, 22 Dec 2010 16:55:23 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
Message-ID: <AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>

Satish,

I have reconfigured the PETSC with ?download-mpich=1 and
?with-device=ch3:sock. The results show that the speed up can now remain
increasing when computing cores increase from 1 to 16. However, the maximum
speed up is still only around 6.0 with 16 cores. The new log files can be
found in the attachment.


(1)

I checked the configuration of the first server again. This server is a
shared-memory computer, with

Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz

Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.

It seems that each core can get 2.7GB/s memory bandwidth which can fulfill
the basic requirement for sparse iterative solvers.

Is this correct? Does the shared-memory type of computer have no benefit for
PETSC when the memory bandwidth is limited?


(2)

Beside, we would like to continue our work by employing a matrix
partitioning / reordering algorithm, such as Metis or ParMetis, to improve
the speed up performance of the program. (The current program works without
any matrix decomposition.)


Matt, as you said in
http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html
,?Reordering
a matrix can result in fewer iterations for an iterative solver?.

Do you think the matrix partitioning/reordering will work for this program?
Or any further suggestions?


Any comments are very welcome! Thank you!


On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Mon, 20 Dec 2010, Yongjun Chen wrote:
>
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly
> and
> > see what I can get.
>
> hydra is just the process manager.
>
> Also --download-mpich uses a slightly older version - with
> device=ch3:sock for portability and valgrind reasons [development]
>
> You might want to install latest mpich manually with the defaut
> device=ch3:nemsis and recheck..
>
> satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/2f3dc444/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 4 on wmss04
Process 2 of total 4 on wmss04
Process 1 of total 4 on wmss04
Process 3 of total 4 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 11:41:09 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28342e-06
Norm of error 1.28342e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 420.527 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:48:09 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Wed Dec 22 12:48:09 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           4.531e+02      1.00000   4.531e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                1.558e+11      1.06872   1.523e+11  6.091e+11
Flops/sec:            3.438e+08      1.06872   3.361e+08  1.344e+09
MPI Messages:         5.906e+03      2.00017   4.430e+03  1.772e+04
MPI Message Lengths:  1.727e+09      2.74432   2.658e+05  4.710e+09
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.5314e+02 100.0%  6.0914e+11 100.0%  1.772e+04 100.0%  2.658e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.7876e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1617
MatMultTranspose    1473 1.0 1.7886e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  1615
MatAssemblyBegin       1 1.0 3.2670e-0312.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.1171e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.6379e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0934e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecDot              2946 1.0 1.9010e+01 2.2 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03  3  1  0  0 66   3  1  0  0 66   365
VecNorm             1475 1.0 1.0313e+01 2.8 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   337
VecCopy                4 1.0 5.2447e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.8803e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 1.3866e+01 1.5 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0   751
VecAYPX             2944 1.0 1.0440e+01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   664
VecAssemblyBegin       6 1.0 1.0071e-0161.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 2.4080e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 1.6040e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   216
VecScatterBegin     2947 1.0 1.7367e+00 2.2 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 3.0331e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
KSPSetup               1 1.0 1.3974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.0934e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 90100100100 99  90100100100 99  1488
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 1.6080e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   216
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3    169902696     0
                 Vec    18             18     31282096     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       638616     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-06
Average time for MPI_Barrier(): 5.97954e-05
Average time for zero size MPI_Send(): 2.07424e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 11:12:03 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 291.989 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:16:55 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 12:16:55 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.113e+02      1.00000   3.113e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.503e+08      1.09702   2.446e+08  1.957e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.1128e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.2879e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2244
MatMultTranspose    1473 1.0 1.2240e+02 1.3 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  2360
MatAssemblyBegin       1 1.0 3.1061e-03 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.0727e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.2912e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1926e+0113.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 6.5343e+0113.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03  9  1  0  0 66   9  1  0  0 66   106
VecNorm             1475 1.0 6.9889e+00 3.6 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   497
VecCopy                4 1.0 5.1496e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.2587e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 8.7103e+00 1.5 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1195
VecAYPX             2944 1.0 5.7803e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1200
VecAssemblyBegin       6 1.0 3.9916e-0214.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.6001e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.6749e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   400
VecScatterBegin     2947 1.0 1.9621e+00 2.7 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 5.9072e+0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 8.9231e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.7991e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99  90100100100 99  2175
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.7041e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   399
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 4.3869e-06
Average time for MPI_Barrier(): 7.25746e-05
Average time for zero size MPI_Send(): 2.06232e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 2 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 11 of total 12 on wmss04
Process 1Process 3 of total 12 on wmss04
 of total 12 on wmss04
Process 5 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
Process 7 of total 12 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 12:13:43 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 253.909 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 12:17:57 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 13:17:57 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.721e+02      1.00000   2.721e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            1.910e+08      1.11689   1.865e+08  2.238e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.7212e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.2467e+02 1.6 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  2318
MatMultTranspose    1473 1.0 1.0645e+02 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2712
MatAssemblyBegin       1 1.0 4.0723e-0274.7 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.3137e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.8801e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2262e+0190.2 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 6.1395e+0111.5 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03  9  1  0  0 66   9  1  0  0 66   113
VecNorm             1475 1.0 5.8101e+00 3.3 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   598
VecCopy                4 1.0 5.6744e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.1137e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 6.6266e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1571
VecAYPX             2944 1.0 5.2210e+00 2.3 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1328
VecAssemblyBegin       6 1.0 5.0129e-0218.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 4.7922e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 7.0911e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   490
VecScatterBegin     2947 1.0 2.5096e+00 3.1 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 4.4540e+01 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 7.9119e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.4149e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 89100100100 99  89100100100 99  2521
PCSetUp                1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 7.1207e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   488
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 6.00815e-06
Average time for MPI_Barrier(): 0.000122833
Average time for zero size MPI_Send(): 2.81533e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 3 of total 16 on wmss04
Process 7 of total 16 on wmss04
Process 1 of total 16 on wmss04
Process 15 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
Process 11 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 0 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 8 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.End Assembly.End Assembly.
End Assembly.

End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.

End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 11:23:54 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.194e-06
Norm of error 1.194e-06, Iterations 1495
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 240.208 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 11:27:54 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 12:27:54 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.565e+02      1.00001   2.565e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.959e+10      1.13060   3.859e+10  6.174e+11
Flops/sec:            1.543e+08      1.13060   1.504e+08  2.407e+09
MPI Messages:         1.198e+04      3.99917   7.118e+03  1.139e+05
MPI Message Lengths:  1.948e+09      7.80981   1.819e+05  2.071e+10
MPI Reductions:       4.543e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5651e+02 100.0%  6.1737e+11 100.0%  1.139e+05 100.0%  1.819e+05      100.0%  4.527e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1496 1.0 1.1625e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 38 47 50 50  0  38 47 50 50  0  2520
MatMultTranspose    1495 1.0 9.7790e+01 1.2 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  2994
MatAssemblyBegin       1 1.0 6.3910e-0314.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.2797e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 3.0708e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.1235e+01111.3 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2990 1.0 5.7054e+0114.6 4.40e+08 1.0 0.0e+00 0.0e+00 3.0e+03  9  1  0  0 66   9  1  0  0 66   123
VecNorm             1497 1.0 5.8130e+00 3.5 2.20e+08 1.0 0.0e+00 0.0e+00 1.5e+03  2  1  0  0 33   2  1  0  0 33   607
VecCopy                4 1.0 3.3658e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8975 1.0 2.5879e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4486 1.0 7.5991e+00 1.6 6.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1391
VecAYPX             2988 1.0 4.6226e+00 1.6 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1523
VecAssemblyBegin       6 1.0 3.9858e-0213.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 6.6996e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2992 1.0 7.0992e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   496
VecScatterBegin     2991 1.0 3.3736e+00 3.7 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2991 1.0 3.3633e+01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 5.6469e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2884e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 89100100100 99  89100100100 99  2697
PCSetUp                1 1.0 5.0068e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2992 1.0 7.1263e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   494
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 8.91685e-06
Average time for MPI_Barrier(): 0.000128984
Average time for zero size MPI_Send(): 1.8239e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 11:56:02 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3sock
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------

From bsmith at mcs.anl.gov  Wed Dec 22 10:40:49 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 22 Dec 2010 10:40:49 -0600
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
Message-ID: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>


On Dec 22, 2010, at 9:55 AM, Yongjun Chen wrote:

> 
> Satish,
> 
> I have reconfigured the PETSC with ?download-mpich=1 and ?with-device=ch3:sock. The results show that the speed up can now remain increasing when computing cores increase from 1 to 16. However, the maximum speed up is still only around 6.0 with 16 cores. The new log files can be found in the attachment.
> 
>  
> (1)
> 
> I checked the configuration of the first server again. This server is a shared-memory computer, with
> 
> Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> 
> Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.

   Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not enough for iterative solvers, in fact this is absolutely terrible for iterative solvers. You really want 5.4 GB/s PER core! This machine is absolutely inappropriate for iterative solvers. No package can give you good speedups on this machine.

   Barry

> 
> It seems that each core can get 2.7GB/s memory bandwidth which can fulfill the basic requirement for sparse iterative solvers.
> 
> Is this correct? Does the shared-memory type of computer have no benefit for PETSC when the memory bandwidth is limited?
> 
>  
> (2)
> 
> Beside, we would like to continue our work by employing a matrix partitioning / reordering algorithm, such as Metis or ParMetis, to improve the speed up performance of the program. (The current program works without any matrix decomposition.)
> 
>  
> Matt, as you said in http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html ,?Reordering a matrix can result in fewer iterations for an iterative solver?.
> 
> Do you think the matrix partitioning/reordering will work for this program? Or any further suggestions?
> 
>  
> Any comments are very welcome! Thank you!
> 
>  
> 
> 
> 
> 
> 
> 
> On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Mon, 20 Dec 2010, Yongjun Chen wrote:
> 
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and
> > see what I can get.
> 
> hydra is just the process manager.
> 
> Also --download-mpich uses a slightly older version - with
> device=ch3:sock for portability and valgrind reasons [development]
> 
> You might want to install latest mpich manually with the defaut
> device=ch3:nemsis and recheck..
> 
> satish
> 
> 
> 
> <log_ch3sock_jacobi_bicg_4cpus.txt><log_ch3sock_jacobi_bicg_8cpus.txt><log_ch3sock_jacobi_bicg_12cpus.txt><log_ch3sock_jacobi_bicg_16cpus.txt>


From yjxd.chen at gmail.com  Wed Dec 22 10:46:26 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Wed, 22 Dec 2010 17:46:26 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
Message-ID: <AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>

On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Dec 22, 2010, at 9:55 AM, Yongjun Chen wrote:
>
> >
> > Satish,
> >
> > I have reconfigured the PETSC with ?download-mpich=1 and
> ?with-device=ch3:sock. The results show that the speed up can now remain
> increasing when computing cores increase from 1 to 16. However, the maximum
> speed up is still only around 6.0 with 16 cores. The new log files can be
> found in the attachment.
> >
> >
> > (1)
> >
> > I checked the configuration of the first server again. This server is a
> shared-memory computer, with
> >
> > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> >
> > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
> memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
>
>    Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not
> enough for iterative solvers, in fact this is absolutely terrible for
> iterative solvers. You really want 5.4 GB/s PER core! This machine is
> absolutely inappropriate for iterative solvers. No package can give you good
> speedups on this machine.
>
>   Barry
>


Barry, there are 16 memories, every 2 memories make up one dual channel,
thus in this machine there are 8 dual channel, each dual channel has the
memory bandwidth 5.4GB/s.

Yongjun


>
> >
> > It seems that each core can get 2.7GB/s memory bandwidth which can
> fulfill the basic requirement for sparse iterative solvers.
> >
> > Is this correct? Does the shared-memory type of computer have no benefit
> for PETSC when the memory bandwidth is limited?
> >
> >
> > (2)
> >
> > Beside, we would like to continue our work by employing a matrix
> partitioning / reordering algorithm, such as Metis or ParMetis, to improve
> the speed up performance of the program. (The current program works without
> any matrix decomposition.)
> >
> >
> > Matt, as you said in
> http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html,?Reordering a matrix can result in fewer iterations for an iterative
> solver?.
> >
> > Do you think the matrix partitioning/reordering will work for this
> program? Or any further suggestions?
> >
> >
> > Any comments are very welcome! Thank you!
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <balay at mcs.anl.gov>
> wrote:
> > On Mon, 20 Dec 2010, Yongjun Chen wrote:
> >
> > > Matt, Barry, thanks a lot for your reply! I will try mpich hydra
> firstly and
> > > see what I can get.
> >
> > hydra is just the process manager.
> >
> > Also --download-mpich uses a slightly older version - with
> > device=ch3:sock for portability and valgrind reasons [development]
> >
> > You might want to install latest mpich manually with the defaut
> > device=ch3:nemsis and recheck..
> >
> > satish
> >
> >
> >
> >
> <log_ch3sock_jacobi_bicg_4cpus.txt><log_ch3sock_jacobi_bicg_8cpus.txt><log_ch3sock_jacobi_bicg_12cpus.txt><log_ch3sock_jacobi_bicg_16cpus.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/e62fe60b/attachment.htm>

From balay at mcs.anl.gov  Wed Dec 22 10:48:07 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 22 Dec 2010 10:48:07 -0600 (CST)
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1012221044520.2469@localhost6.localdomain6>

On Wed, 22 Dec 2010, Yongjun Chen wrote:

> Satish,
> 
> I have reconfigured the PETSC with ?download-mpich=1 and
> ?with-device=ch3:sock. The results show that the speed up can now remain
> increasing when computing cores increase from 1 to 16. However, the maximum
> speed up is still only around 6.0 with 16 cores. The new log files can be
> found in the attachment.

Perhaps this was mentioned eariler. Performance doesn't scale with
number of cores. [its depends on both scalable compute unites -aka
cores, as well as scalable memory modules]

When the hardware is not designed to provide scalable performance -
expecting it is wrong. The goal should be to extract max performance
out of a given piece of hardware - not scalable performance.

Wrt with-device=ch3:sock - it might not be the best performer for
shared memory Try the default 'device=ch3:nemsis'

Satish

From balay at mcs.anl.gov  Wed Dec 22 10:54:53 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 22 Dec 2010 10:54:53 -0600 (CST)
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>

On Wed, 22 Dec 2010, Yongjun Chen wrote:

> On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

> > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> > >
> > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
> > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
> >
> >    Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not
> > enough for iterative solvers, in fact this is absolutely terrible for
> > iterative solvers. You really want 5.4 GB/s PER core! This machine is
> > absolutely inappropriate for iterative solvers. No package can give you good
> > speedups on this machine.
> 
> Barry, there are 16 memories, every 2 memories make up one dual channel,
> thus in this machine there are 8 dual channel, each dual channel has the
> memory bandwidth 5.4GB/s.

What hardware is this? [processor/chipset?]

>From what you say - it looks like each chip has 4cores, and 2
dual-channel memory controllers for each of them.

The question is - does the hardware provide scalable memory-bandwidth
per core?  Most machines don't.

I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core run.

So if the algorithm is able to use 5.4GB/s [or more] for 1 threads,
10.8 [or more] for 2 threads - you would just see scalable performance
from 1 to 2, and 3, 4 would perhaps be slightly incremental to the
2-core performance.

Satish

From yjxd.chen at gmail.com  Wed Dec 22 11:12:43 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Wed, 22 Dec 2010 18:12:43 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
Message-ID: <AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>

On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> > > >
> > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so
> the
> > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
> > >
> > >    Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not
> > > enough for iterative solvers, in fact this is absolutely terrible for
> > > iterative solvers. You really want 5.4 GB/s PER core! This machine is
> > > absolutely inappropriate for iterative solvers. No package can give you
> good
> > > speedups on this machine.
> >
> > Barry, there are 16 memories, every 2 memories make up one dual channel,
> > thus in this machine there are 8 dual channel, each dual channel has the
> > memory bandwidth 5.4GB/s.
>
> What hardware is this? [processor/chipset?]
>

By dmidecode, it shows the processor is

Handle 0x0010, DMI type 4, 40 bytes
Processor Information
        Socket Designation: CPU 4
        Type: Central Processor
        Family: Quad-Core Opteron
        Manufacturer: AMD
        ID: 06 05 F6 40 74 03 E8 3D
        Signature: Family 5, Model 0, Stepping 6
        Flags:
                DE (Debugging extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (Fast floating-point save and restore)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Hyper-threading technology)
                TM (Thermal monitor supported)
        Version: Quad-Core AMD Opteron(tm) Processor 8360 SE
        Voltage: 1.5 V
        External Clock: 200 MHz
        Max Speed: 4600 MHz
        Current Speed: 2500 MHz
        Status: Populated, Enabled
        Upgrade: Other
        L1 Cache Handle: 0x0011
        L2 Cache Handle: 0x0012
        L3 Cache Handle: 0x0013
        Serial Number: N/A
        Asset Tag: N/A
        Part Number: N/A
        Core Count: 4
        Core Enabled: 4
        Characteristics:
                64-bit capable


> >From what you say - it looks like each chip has 4cores, and 2
> dual-channel memory controllers for each of them.
>
> The question is - does the hardware provide scalable memory-bandwidth
> per core?  Most machines don't.
>

This point is not clear for me right now.


> I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core
> run.
>
> So if the algorithm is able to use 5.4GB/s [or more] for 1 threads,
> 10.8 [or more] for 2 threads - you would just see scalable performance
> from 1 to 2, and 3, 4 would perhaps be slightly incremental to the
> 2-core performance.
>
> Satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/bdd4c8cb/attachment.htm>

From yjxd.chen at gmail.com  Wed Dec 22 11:23:57 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Wed, 22 Dec 2010 18:23:57 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012221044520.2469@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<alpine.LFD.2.02.1012221044520.2469@localhost6.localdomain6>
Message-ID: <AANLkTim7z8SYL852=MAqCBP5Kn=Mu+jsSQix1UuBOXB3@mail.gmail.com>

On Wed, Dec 22, 2010 at 5:48 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > Satish,
> >
> > I have reconfigured the PETSC with ?download-mpich=1 and
> > ?with-device=ch3:sock. The results show that the speed up can now remain
> > increasing when computing cores increase from 1 to 16. However, the
> maximum
> > speed up is still only around 6.0 with 16 cores. The new log files can be
> > found in the attachment.
>
> Perhaps this was mentioned eariler. Performance doesn't scale with
> number of cores. [its depends on both scalable compute unites -aka
> cores, as well as scalable memory modules]
>
> When the hardware is not designed to provide scalable performance -
> expecting it is wrong. The goal should be to extract max performance
> out of a given piece of hardware - not scalable performance.
>
Wrt with-device=ch3:sock - it might not be the best performer for
> shared memory Try the default 'device=ch3:nemsis'
>
> Satish


I am now trying with --with-device=ch3:nemsis. Hope it can have a little
better performance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/3f9669af/attachment-0001.htm>

From balay at mcs.anl.gov  Wed Dec 22 11:32:10 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 22 Dec 2010 11:32:10 -0600 (CST)
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>

On Wed, 22 Dec 2010, Yongjun Chen wrote:

> On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > On Wed, 22 Dec 2010, Yongjun Chen wrote:
> >
> > > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> > > > >
> > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so
> > the
> > > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
> > > >
> > > >    Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not
> > > > enough for iterative solvers, in fact this is absolutely terrible for
> > > > iterative solvers. You really want 5.4 GB/s PER core! This machine is
> > > > absolutely inappropriate for iterative solvers. No package can give you
> > good
> > > > speedups on this machine.
> > >
> > > Barry, there are 16 memories, every 2 memories make up one dual channel,
> > > thus in this machine there are 8 dual channel, each dual channel has the
> > > memory bandwidth 5.4GB/s.
> >
> > What hardware is this? [processor/chipset?]
> >
> 
> By dmidecode, it shows the processor is
> 
> Handle 0x0010, DMI type 4, 40 bytes
> Processor Information
>         Socket Designation: CPU 4
>         Type: Central Processor
>         Family: Quad-Core Opteron
>         Manufacturer: AMD
>         ID: 06 05 F6 40 74 03 E8 3D
>         Signature: Family 5, Model 0, Stepping 6
>         Flags:
>                 DE (Debugging extension)
>                 TSC (Time stamp counter)
>                 MSR (Model specific registers)
>                 PAE (Physical address extension)
>                 CX8 (CMPXCHG8 instruction supported)
>                 APIC (On-chip APIC hardware supported)
>                 CLFSH (CLFLUSH instruction supported)
>                 DS (Debug store)
>                 ACPI (ACPI supported)
>                 MMX (MMX technology supported)
>                 FXSR (Fast floating-point save and restore)
>                 SSE2 (Streaming SIMD extensions 2)
>                 SS (Self-snoop)
>                 HTT (Hyper-threading technology)
>                 TM (Thermal monitor supported)
>         Version: Quad-Core AMD Opteron(tm) Processor 8360 SE
>         Voltage: 1.5 V
>         External Clock: 200 MHz
>         Max Speed: 4600 MHz
>         Current Speed: 2500 MHz
>         Status: Populated, Enabled
>         Upgrade: Other
>         L1 Cache Handle: 0x0011
>         L2 Cache Handle: 0x0012
>         L3 Cache Handle: 0x0013
>         Serial Number: N/A
>         Asset Tag: N/A
>         Part Number: N/A
>         Core Count: 4
>         Core Enabled: 4
>         Characteristics:
>                 64-bit capable

ok - your machine has the following schematic.. [from google]

http://www.qdpma.com/SystemArchitecture_files/013_Opteron.png

> > >From what you say - it looks like each chip has 4cores, and 2
> > dual-channel memory controllers for each of them.
> >
> > The question is - does the hardware provide scalable memory-bandwidth
> > per core?  Most machines don't.
> >
> 
> This point is not clear for me right now.

Hm.. the point is: the hardware designer had 2 choices:

- provide a single memory controller per core [so each core gets only
  2.7gb/s - i.e 4 memory controllers per CPU, and common L2 cache
  across all cores not possible]

- provide a single memory controller with 2-dual memory channels [i.e
  10.8GB/s] thats shared by 1-4 cores. With this - there can be a
  single L2 cache for all 4 cores.

Which of the above 2 is a good design? The first one provides scalable
performance - but the second one doesn't. Also the first one limits
the performance of sequential [np=1 applications]. The second one
provides all bandwidth to even np=1 codes - so they might have better
sequential performane. And then performance differences due to different
cache synchronization issues..

Satish


> 
> 
> 
> > I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core
> > run.
> >
> > So if the algorithm is able to use 5.4GB/s [or more] for 1 threads,
> > 10.8 [or more] for 2 threads - you would just see scalable performance
> > from 1 to 2, and 3, 4 would perhaps be slightly incremental to the
> > 2-core performance.
> >
> > Satish
> >
> 


From yjxd.chen at gmail.com  Wed Dec 22 11:49:40 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Wed, 22 Dec 2010 18:49:40 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
Message-ID: <AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>

On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > > On Wed, 22 Dec 2010, Yongjun Chen wrote:
> > >
> > > > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > > > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> > > > > >
> > > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit,
> so
> > > the
> > > > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
> > > > >
> > > > >    Wait a minute. You have 16 cores that share 5.4 GB/s???? This is
> not
> > > > > enough for iterative solvers, in fact this is absolutely terrible
> for
> > > > > iterative solvers. You really want 5.4 GB/s PER core! This machine
> is
> > > > > absolutely inappropriate for iterative solvers. No package can give
> you
> > > good
> > > > > speedups on this machine.
> > > >
> > > > Barry, there are 16 memories, every 2 memories make up one dual
> channel,
> > > > thus in this machine there are 8 dual channel, each dual channel has
> the
> > > > memory bandwidth 5.4GB/s.
> > >
> > > What hardware is this? [processor/chipset?]
> > >
> >
> > By dmidecode, it shows the processor is
> >
> > Handle 0x0010, DMI type 4, 40 bytes
> > Processor Information
> >         Socket Designation: CPU 4
> >         Type: Central Processor
> >         Family: Quad-Core Opteron
> >         Manufacturer: AMD
> >         ID: 06 05 F6 40 74 03 E8 3D
> >         Signature: Family 5, Model 0, Stepping 6
> >         Flags:
> >                 DE (Debugging extension)
> >                 TSC (Time stamp counter)
> >                 MSR (Model specific registers)
> >                 PAE (Physical address extension)
> >                 CX8 (CMPXCHG8 instruction supported)
> >                 APIC (On-chip APIC hardware supported)
> >                 CLFSH (CLFLUSH instruction supported)
> >                 DS (Debug store)
> >                 ACPI (ACPI supported)
> >                 MMX (MMX technology supported)
> >                 FXSR (Fast floating-point save and restore)
> >                 SSE2 (Streaming SIMD extensions 2)
> >                 SS (Self-snoop)
> >                 HTT (Hyper-threading technology)
> >                 TM (Thermal monitor supported)
> >         Version: Quad-Core AMD Opteron(tm) Processor 8360 SE
> >         Voltage: 1.5 V
> >         External Clock: 200 MHz
> >         Max Speed: 4600 MHz
> >         Current Speed: 2500 MHz
> >         Status: Populated, Enabled
> >         Upgrade: Other
> >         L1 Cache Handle: 0x0011
> >         L2 Cache Handle: 0x0012
> >         L3 Cache Handle: 0x0013
> >         Serial Number: N/A
> >         Asset Tag: N/A
> >         Part Number: N/A
> >         Core Count: 4
> >         Core Enabled: 4
> >         Characteristics:
> >                 64-bit capable
>
> ok - your machine has the following schematic.. [from google]
>
> http://www.qdpma.com/SystemArchitecture_files/013_Opteron.png
>
> > > >From what you say - it looks like each chip has 4cores, and 2
> > > dual-channel memory controllers for each of them.
> > >
> > > The question is - does the hardware provide scalable memory-bandwidth
> > > per core?  Most machines don't.
> > >
> >
> > This point is not clear for me right now.
>
> Hm.. the point is: the hardware designer had 2 choices:
>
> - provide a single memory controller per core [so each core gets only
>  2.7gb/s - i.e 4 memory controllers per CPU, and common L2 cache
>  across all cores not possible]
>
> - provide a single memory controller with 2-dual memory channels [i.e
>  10.8GB/s] thats shared by 1-4 cores. With this - there can be a
>  single L2 cache for all 4 cores.
>
> Which of the above 2 is a good design? The first one provides scalable
> performance - but the second one doesn't. Also the first one limits
> the performance of sequential [np=1 applications]. The second one
> provides all bandwidth to even np=1 codes - so they might have better
> sequential performane. And then performance differences due to different
> cache synchronization issues..
>
> Satish
>
> Thanks a lot, Satish. It is much clear now. But for the choice of the two,
the program dmidecode does not show this information. Do you know any way to
get it?


>
>
> >
> >
> >
> > > I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core
> > > run.
> > >
> > > So if the algorithm is able to use 5.4GB/s [or more] for 1 threads,
> > > 10.8 [or more] for 2 threads - you would just see scalable performance
> > > from 1 to 2, and 3, 4 would perhaps be slightly incremental to the
> > > 2-core performance.
> > >
> > > Satish
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/0eb5771b/attachment.htm>

From balay at mcs.anl.gov  Wed Dec 22 11:53:04 2010
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 22 Dec 2010 11:53:04 -0600 (CST)
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>

On Wed, 22 Dec 2010, Yongjun Chen wrote:

> On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > Thanks a lot, Satish. It is much clear now. But for the choice of the two,
> the program dmidecode does not show this information. Do you know any way to
> get it?

why do you expect dmidecode to show that?

You'll have to look for the CPU/chipset hardware documentation - and
look at the details - and sometimes they mention these details..

Satish

From yjxd.chen at gmail.com  Wed Dec 22 12:11:12 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Wed, 22 Dec 2010 19:11:12 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
	<alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
Message-ID: <AANLkTin2bX+n=Kex+0XkAk3SjcTY=fi30=bdA-jPeJd4@mail.gmail.com>

On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > > Thanks a lot, Satish. It is much clear now. But for the choice of the
> two,
> > the program dmidecode does not show this information. Do you know any way
> to
> > get it?
>
> why do you expect dmidecode to show that?
>
> You'll have to look for the CPU/chipset hardware documentation - and
> look at the details - and sometimes they mention these details..
>
> Satish
>


Thanks, Satish. Yes, I need to check it.
Just now I re-configured PETSC with the option --with-device=ch3:nemsis. The
results are almost the same as --with-device=ch3:sock. As can be seen in the
attachment.
I hope the matrix partitioning - reordering algorithm would have some
positive effects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/3dc041ce/attachment-0001.htm>
-------------- next part --------------
Process 0 of total 8 on wmss04
Process 4 of total 8 on wmss04
Process 1 of total 8 on wmss04
Process 5 of total 8 on wmss04
Process 6 of total 8 on wmss04
Process 2 of total 8 on wmss04
Process 3 of total 8 on wmss04
Process 7 of total 8 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.End Assembly.

End Assembly.

=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 17:41:47 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.32502e-06
Norm of error 1.32502e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 333.681 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:47:21 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 18:47:21 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.558e+02      1.00000   3.558e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                7.792e+10      1.09702   7.614e+10  6.091e+11
Flops/sec:            2.190e+08      1.09702   2.140e+08  1.712e+09
MPI Messages:         5.906e+03      2.00017   5.169e+03  4.135e+04
MPI Message Lengths:  1.866e+09      4.61816   2.430e+05  1.005e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.5581e+02 100.0%  6.0914e+11 100.0%  4.135e+04 100.0%  2.430e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.5404e+02 1.6 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 35 47 50 50  0  35 47 50 50  0  1876
MatMultTranspose    1473 1.0 1.4721e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50  0  37 47 50 50  0  1962
MatAssemblyBegin       1 1.0 6.0289e-0316.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.2618e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.0790e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.0855e+0112.8 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 9.9344e+0120.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 12  1  0  0 66  12  1  0  0 66    70
VecNorm             1475 1.0 5.6723e+00 2.9 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   613
VecCopy                4 1.0 5.5063e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.1978e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 8.6108e+00 1.3 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1209
VecAYPX             2944 1.0 6.0635e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1144
VecAssemblyBegin       6 1.0 4.8455e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 3.5286e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 8.7080e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   399
VecScatterBegin     2947 1.0 1.8601e+00 2.6 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       2947 1.0 9.0296e+0116.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
KSPSetup               1 1.0 9.8538e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.2263e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 91100100100 99  91100100100 99  1887
PCSetUp                1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 8.7381e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   397
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     84944064     0
                 Vec    18             18     15741712     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       409008     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 4.98295e-06
Average time for MPI_Barrier(): 9.76086e-05
Average time for zero size MPI_Send(): 2.81334e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 12 on wmss04
Process 4 of total 12 on wmss04
Process 6 of total 12 on wmss04
Process 5 of total 12 on wmss04Process 11 of total 12 on wmss04

Process 2 of total 12 on wmss04
Process 7 of total 12 on wmss04
Process 3 of total 12 on wmss04
Process 8 of total 12 on wmss04
Process 1 of total 12 on wmss04
Process 9 of total 12 on wmss04
Process 10 of total 12 on wmss04
The dimension of Matrix A is n = 1177754
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 17:55:12 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.28414e-06
Norm of error 1.28414e-06, Iterations 1473
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 241.392 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:59:13 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 18:59:13 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.594e+02      1.00000   2.594e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                5.197e+10      1.11689   5.074e+10  6.089e+11
Flops/sec:            2.004e+08      1.11689   1.956e+08  2.348e+09
MPI Messages:         5.906e+03      2.00017   5.415e+03  6.498e+04
MPI Message Lengths:  1.887e+09      6.23794   2.345e+05  1.524e+10
MPI Reductions:       4.477e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5935e+02 100.0%  6.0890e+11 100.0%  6.498e+04 100.0%  2.345e+05      100.0%  4.461e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1474 1.0 1.1203e+02 1.5 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 39 47 50 50  0  39 47 50 50  0  2579
MatMultTranspose    1473 1.0 9.9342e+01 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2906
MatAssemblyBegin       1 1.0 3.7930e-03 8.9 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.1536e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 2.2507e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2744e+0166.4 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2946 1.0 5.4256e+0115.3 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03  6  1  0  0 66   6  1  0  0 66   128
VecNorm             1475 1.0 7.3386e+00 5.2 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   473
VecCopy                4 1.0 6.2873e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8843 1.0 2.5036e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4420 1.0 7.4288e+00 1.8 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1401
VecAYPX             2944 1.0 5.0487e+00 2.5 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  1374
VecAssemblyBegin       6 1.0 3.4969e-0211.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 5.5075e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2948 1.0 7.2035e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   482
VecScatterBegin     2947 1.0 2.5759e+00 2.7 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2947 1.0 5.1555e+0111.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
KSPSetup               1 1.0 8.2631e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2851e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 88100100100 99  88100100100 99  2664
PCSetUp                1 1.0 7.1526e-06 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2948 1.0 7.2339e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   480
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     56593044     0
                 Vec    18             18     10534536     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       305424     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 7.82013e-06
Average time for MPI_Barrier(): 9.52244e-05
Average time for zero size MPI_Send(): 2.15769e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------
-------------- next part --------------
Process 0 of total 16 on wmss04
Process 8 of total 16 on wmss04
Process 4 of total 16 on wmss04
Process 6 of total 16 on wmss04
Process 14 of total 16 on wmss04
Process 12 of total 16 on wmss04
Process 2 of total 16 on wmss04
Process 10 of total 16 on wmss04
Process Process 3 of total 16 on wmss04
Process 15 of total 16 on wmss04
7 of total 16 on wmss04Process 1 of total 16 on wmss04
Process 9 of total 16 on wmss04
Process 5 of total 16 on wmss04
Process 13 of total 16 on wmss04
The dimension of Matrix A is n = 1177754
Process 11 of total 16 on wmss04
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:
Begin Assembly:Begin Assembly:

Begin Assembly:
End Assembly.
End Assembly.
End Assembly.End Assembly.
End Assembly.End Assembly.

End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.
End Assembly.

=========================================================
Begin the solving:										  
=========================================================
The current time is: Wed Dec 22 17:50:47 2010

KSP Object:
  type: bicg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpisbaij, rows=1177754, cols=1177754
    total: nonzeros=49908476, allocated nonzeros=49908476
        block size is 1

norm(b-Ax)=1.23596e-06
Norm of error 1.23596e-06, Iterations 1481
=========================================================
The solver has finished successfully!			          
=========================================================
The solving time is 227.888 seconds.
The time accuracy is 1e-06 second.
The current time is Wed Dec 22 17:54:35 2010

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 18:54:35 2010
Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           2.442e+02      1.00001   2.442e+02
Objects:              3.000e+01      1.00000   3.000e+01
Flops:                3.922e+10      1.13060   3.822e+10  6.116e+11
Flops/sec:            1.606e+08      1.13060   1.565e+08  2.504e+09
MPI Messages:         1.187e+04      3.99916   7.051e+03  1.128e+05
MPI Message Lengths:  1.929e+09      7.80850   1.819e+05  2.052e+10
MPI Reductions:       4.501e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.4422e+02 100.0%  6.1159e+11 100.0%  1.128e+05 100.0%  1.819e+05      100.0%  4.485e+03  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1482 1.0 1.1549e+02 2.0 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 36 47 50 50  0  36 47 50 50  0  2513
MatMultTranspose    1481 1.0 9.3652e+01 1.4 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 32 47 50 50  0  32 47 50 50  0  3097
MatAssemblyBegin       1 1.0 4.6110e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.1871e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 5.1212e-04 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecView                1 1.0 1.2031e+01123.8 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecDot              2962 1.0 7.2313e+0122.5 4.36e+08 1.0 0.0e+00 0.0e+00 3.0e+03 13  1  0  0 66  13  1  0  0 66    96
VecNorm             1483 1.0 5.2508e+00 4.6 2.18e+08 1.0 0.0e+00 0.0e+00 1.5e+03  1  1  0  0 33   1  1  0  0 33   665
VecCopy                4 1.0 3.2623e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              8891 1.0 2.5386e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             4444 1.0 6.6341e+00 1.6 6.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1578
VecAYPX             2960 1.0 4.2830e+00 1.7 4.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  1628
VecAssemblyBegin       6 1.0 4.0186e-0213.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         6 1.0 6.0081e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    2964 1.0 6.2569e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   558
VecScatterBegin     2963 1.0 2.9219e+00 4.0 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd       2963 1.0 5.0568e+01 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
KSPSetup               1 1.0 5.8019e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.1573e+02 1.0 3.92e+10 1.1 1.1e+05 1.8e+05 4.4e+03 88100100100 99  88100100100 99  2834
PCSetUp                1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2964 1.0 6.2830e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   556
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              3     42424600     0
                 Vec    18             18      7924896     0
         Vec Scatter     2              2         1736     0
           Index Set     4              4       247632     0
       Krylov Solver     1              1          832     0
      Preconditioner     1              1          872     0
              Viewer     1              1          544     0
========================================================================================================================
Average time to get PetscTime(): 1.38998e-05
Average time for MPI_Barrier(): 0.00011363
Average time for zero size MPI_Send(): 2.03103e-05
#PETSc Option Table entries:
-ksp_type bicg
-log_summary
-pc_type jacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Wed Dec 22 18:24:43 2010
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1
-----------------------------------------
Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 
Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized
Using PETSc arch: linux-gnu-c-opt-ch3nemsis
-----------------------------------------
Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O   
Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O   
-----------------------------------------
Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include  
------------------------------------------
Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O 
Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O 
Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc       -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl  
------------------------------------------

From jed at 59A2.org  Wed Dec 22 16:08:37 2010
From: jed at 59A2.org (Jed Brown)
Date: Wed, 22 Dec 2010 23:08:37 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
	<alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
Message-ID: <AANLkTi=HY-264aBgr65=J-OY+Ygt2_F-B6GLw5pK0R55@mail.gmail.com>

On Wed, Dec 22, 2010 at 18:53, Satish Balay <balay at mcs.anl.gov> wrote:

> You'll have to look for the CPU/chipset hardware documentation - and
> look at the details - and sometimes they mention these details..
>

hwloc shows the cache hierarchy.  http://www.open-mpi.org/projects/hwloc/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/3fdd3be0/attachment.htm>

From knepley at gmail.com  Wed Dec 22 18:43:30 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 22 Dec 2010 16:43:30 -0800
Subject: [petsc-users] ParMETIS question
In-Reply-To: <4D1204A9.7090209@tu-dresden.de>
References: <4D10B086.2090503@tu-dresden.de>
	<AANLkTin-SPMjsT4uB4Ms7wprsAXX9OGy+nPggstTf-nn@mail.gmail.com>
	<4D11BFCA.2030800@tu-dresden.de> <4D1204A9.7090209@tu-dresden.de>
Message-ID: <AANLkTimsFZ+FU2JfOwBN_8WSEG+mO6f-27-pus7oaH4w@mail.gmail.com>

On Wed, Dec 22, 2010 at 6:01 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

> So, I found the problem related to empty partitions. It is not possible to
> weight vertices (i.e. elements of a mesh) in such a way that one weight is
> much higher than the other ones. For more details see
>
> http://glaros.dtc.umn.edu/flyspray/task/11
>
> Its a pity that ParMetis makes is very hard to find this kind of errors.
>
> The open question for me is about the non continuous partitions. Is it a
> normal behavior of ParMetis to create partitions that are not continous?


Yes, this is normal.

   Matt


>
> Thomas
>
>
> Thomas Witkowski wrote:
>
>> Okay, in my computations, I have empty partitions on some ranks and
>> definitely not
>> minimal boundary sizes. So may be I generate a wrong input. But if this is
>> the case, I
>> wonder why the resulting mesh partitioning is quite good. If I neglect the
>> problem of
>> empty partitions, the redistributed mesh leads to a very good load
>> balancing. Is there
>> any meaningful way to debug the problem? Is there something link a
>> "verbose mode" in
>> ParMetis that says me whats happen on the input data? Otherwise I have to
>> print all the
>> input data to the screen and check it by hand. Although I have a quite
>> small example with
>> 128 overall coarse mesh elements on 8 ranks, this is not big fun :)
>>
>> Thomas
>>
>> @Matthew: By mistake I've answered your mail directly to you and not to
>> the mailing list, therefore I sent it now here again
>>
>> Matthew Knepley wrote:
>>
>>> On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski <
>>> thomas.witkowski at tu-dresden.de <mailto:thomas.witkowski at tu-dresden.de>>
>>> wrote:
>>>
>>>    Hi,
>>>
>>>    I have a not directly PETSc related question, but I hope to get
>>>    some answer from the community here. In my FEM code, I make use of
>>>    ParMETIS to partition the mesh. I make direct use of this library
>>>    and not of PETSc's ParMETIS integration. The initial partition is
>>>    always fine, but I use the ParMETIS_V3_AdaptiveRepart function for
>>>    repartition the mesh due to local mesh adaption. In most cases,
>>>    the result is fine, but there are two points, where I have trouble
>>>    with:
>>>
>>>    1) Sometimes ParMETIS generates empty partitions, i.e., a
>>>    processor has zero mesh elements. This is something my code cannot
>>>    handle. Is this a bug or a feature? If it is a feature, is there
>>>    any possiblity to disable it?
>>>
>>>
>>> ParMetis has a balance constraint if you weight vertices. This will
>>> enforce equal size partitions.
>>>
>>>    2) In most cases the specific partitions are not connected. If I
>>>    put all data to ParMETIS in a correct way, is this okay? My code
>>>    can handle it, but is slows down the computation due to larger
>>>    interior boundaries and therefore to more communications.
>>>
>>>
>>> ParMetis minimizes the overall boundary size, so I do not understand how
>>> you could see this slowdown.
>>>
>>>   Matt
>>>
>>>    Does anyone of you know an answer to these question? Is there a
>>>    debug mode in ParMETIS, where I can see which data is set to its
>>>    function calls?
>>>
>>>    Regards,
>>>
>>>    Thomas
>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>>
>>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/47e78d4a/attachment.htm>

From knepley at gmail.com  Wed Dec 22 19:03:52 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 22 Dec 2010 17:03:52 -0800
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTin2bX+n=Kex+0XkAk3SjcTY=fi30=bdA-jPeJd4@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
	<alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
	<AANLkTin2bX+n=Kex+0XkAk3SjcTY=fi30=bdA-jPeJd4@mail.gmail.com>
Message-ID: <AANLkTi=c9JxggKQAMGFpTmt_nw+RO9CV0spn84bK58dr@mail.gmail.com>

On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:

>
>
> On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
>> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>>
>> > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov>
>> wrote:
>> >
>> > > Thanks a lot, Satish. It is much clear now. But for the choice of the
>> two,
>> > the program dmidecode does not show this information. Do you know any
>> way to
>> > get it?
>>
>> why do you expect dmidecode to show that?
>>
>> You'll have to look for the CPU/chipset hardware documentation - and
>> look at the details - and sometimes they mention these details..
>>
>> Satish
>>
>
>
> Thanks, Satish. Yes, I need to check it.
> Just now I re-configured PETSC with the option --with-device=ch3:nemsis.
> The results are almost the same as --with-device=ch3:sock. As can be seen in
> the attachment.
> I hope the matrix partitioning - reordering algorithm would have some
> positive effects.
>

1) To see a large gain, the ordering you start with would have to be very
bad. Maybe it is. These
    orderings try to minimize bandwidth, which means minimize communication
in the MatMult.

2) If you use incomplete facotrization, the ordering can have a large effect
on conditioning, so
    number of iterations, which does not improve scalability. This would
impact scalability if you
   use a parallel IC, however all those packages reorder your matrix
already.

In short, I suspect this will not help a lot, except maybe with
conditioning, which is what I was refering to in the quote.

    Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/94c822b1/attachment.htm>

From jed at 59A2.org  Wed Dec 22 20:32:00 2010
From: jed at 59A2.org (Jed Brown)
Date: Thu, 23 Dec 2010 03:32:00 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTi=c9JxggKQAMGFpTmt_nw+RO9CV0spn84bK58dr@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
	<alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
	<AANLkTin2bX+n=Kex+0XkAk3SjcTY=fi30=bdA-jPeJd4@mail.gmail.com>
	<AANLkTi=c9JxggKQAMGFpTmt_nw+RO9CV0spn84bK58dr@mail.gmail.com>
Message-ID: <AANLkTi=mfCcchC0k835bvgQ4-Dp6kYBdupKWtUu6zYzO@mail.gmail.com>

I disagree, there is easily a factor of two in flop/s between a naive
ordering (e.g. hierarchical by node type in a finite element method) and a
good low-bandwidth ordering.

This is in the FUN3D papers and still true today, in my experience.

Incomplete factorization is also very order dependent, as you note.

Jed

On Dec 22, 2010 5:03 PM, "Matthew Knepley" <knepley at gmail.com> wrote:

On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen <yjxd.chen at gmail.com> wrote:

>
>
>
> On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>>
>> On Wed, 22 De...
1) To see a large gain, the ordering you start with would have to be very
bad. Maybe it is. These
    orderings try to minimize bandwidth, which means minimize communication
in the MatMult.

2) If you use incomplete facotrization, the ordering can have a large effect
on conditioning, so
    number of iterations, which does not improve scalability. This would
impact scalability if you
   use a parallel IC, however all those packages reorder your matrix
already.

In short, I suspect this will not help a lot, except maybe with
conditioning, which is what I was refering to in the quote.

    Matt


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101223/89beccb3/attachment.htm>

From yjxd.chen at gmail.com  Thu Dec 23 02:28:48 2010
From: yjxd.chen at gmail.com (Yongjun Chen)
Date: Thu, 23 Dec 2010 09:28:48 +0100
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTi=mfCcchC0k835bvgQ4-Dp6kYBdupKWtUu6zYzO@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
	<alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
	<AANLkTin2bX+n=Kex+0XkAk3SjcTY=fi30=bdA-jPeJd4@mail.gmail.com>
	<AANLkTi=c9JxggKQAMGFpTmt_nw+RO9CV0spn84bK58dr@mail.gmail.com>
	<AANLkTi=mfCcchC0k835bvgQ4-Dp6kYBdupKWtUu6zYzO@mail.gmail.com>
Message-ID: <AANLkTik3stJ8mYjY1Y1ib1z35iJ1+xkF=QQHZgrMpebz@mail.gmail.com>

Matt, Jed, thanks a lot for the discussions. Since the ordering could
minimizing the bandwidth, I think it is really worth to have a try with the
matrix partitioning / ordering. If there is a factor two of increase in the
flop rate, that's quite promising!


On Thu, Dec 23, 2010 at 3:32 AM, Jed Brown <jed at 59a2.org> wrote:

> I disagree, there is easily a factor of two in flop/s between a naive
> ordering (e.g. hierarchical by node type in a finite element method) and a
> good low-bandwidth ordering.
>
> This is in the FUN3D papers and still true today, in my experience.
>
> Incomplete factorization is also very order dependent, as you note.
>
> Jed
>
> On Dec 22, 2010 5:03 PM, "Matthew Knepley" <knepley at gmail.com> wrote:
>
> On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen <yjxd.chen at gmail.com>
> wrote:
>
> >
> > On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >>
> >> On Wed, 22 De...
> 1) To see a large gain, the ordering you start with would have to be very
> bad. Maybe it is. These
>     orderings try to minimize bandwidth, which means minimize communication
> in the MatMult.
>
> 2) If you use incomplete facotrization, the ordering can have a large
> effect on conditioning, so
>     number of iterations, which does not improve scalability. This would
> impact scalability if you
>    use a parallel IC, however all those packages reorder your matrix
> already.
>
> In short, I suspect this will not help a lot, except maybe with
> conditioning, which is what I was refering to in the quote.
>
>     Matt
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more...
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101223/6b3f076c/attachment-0001.htm>

From knepley at gmail.com  Thu Dec 23 19:45:04 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 23 Dec 2010 17:45:04 -0800
Subject: [petsc-users] Very poor speed up performance
In-Reply-To: <AANLkTi=mfCcchC0k835bvgQ4-Dp6kYBdupKWtUu6zYzO@mail.gmail.com>
References: <AANLkTim3+U85VJEyMT_HvPc=7hQrffHq8nB7-+xdn6qA@mail.gmail.com>
	<AANLkTi=46nW7i9KTLmgevHTGkLOqgSdhrhUxpBqec+fn@mail.gmail.com>
	<AANLkTim-raN3oSixU6Mk3Ftwt7rYT=8CG0-QM8dUA=bG@mail.gmail.com>
	<AANLkTi=8myiX3_F0pTY76Sv2cT8sY4TATGhzQOJTHK1N@mail.gmail.com>
	<AANLkTikam-2fE32_KOiA-tKgn8eojb33u6ca0kvhD_zn@mail.gmail.com>
	<alpine.LFD.2.02.1012201601420.2555@localhost6.localdomain6>
	<AANLkTinjOgNhx0xSUNrEn_sVS928VRiuYDWuuF7xBUyc@mail.gmail.com>
	<892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov>
	<AANLkTimCU58A_ryVP7qLbi5SUwEQ_NuL0iiiuin0geg0@mail.gmail.com>
	<alpine.LFD.2.02.1012221049130.2469@localhost6.localdomain6>
	<AANLkTikNwjy7ARUM2sUGw2bCueHHKzsTL4qtPJwv+gzm@mail.gmail.com>
	<alpine.LFD.2.02.1012221125250.2469@localhost6.localdomain6>
	<AANLkTi=5G-70U-8M1d69x8Xyaz8sp3UCXDZiHtNH2LPq@mail.gmail.com>
	<alpine.LFD.2.02.1012221150590.2469@localhost6.localdomain6>
	<AANLkTin2bX+n=Kex+0XkAk3SjcTY=fi30=bdA-jPeJd4@mail.gmail.com>
	<AANLkTi=c9JxggKQAMGFpTmt_nw+RO9CV0spn84bK58dr@mail.gmail.com>
	<AANLkTi=mfCcchC0k835bvgQ4-Dp6kYBdupKWtUu6zYzO@mail.gmail.com>
Message-ID: <AANLkTim2kPeOrEm5kmJ44xoD7VauMu2xRTMCmvMNbwS+@mail.gmail.com>

On Wed, Dec 22, 2010 at 6:32 PM, Jed Brown <jed at 59a2.org> wrote:

> I disagree, there is easily a factor of two in flop/s between a naive
> ordering (e.g. hierarchical by node type in a finite element method) and a
> good low-bandwidth ordering.
>
> This is in the FUN3D papers and still true today, in my experience.
>
There is no doubt that this difference can exist, but many mesh generators
(such as triangle) give back a good
ordering. FUN3D used an inexplicably bad ordering.

   Matt


> Incomplete factorization is also very order dependent, as you note.
>
> Jed
>
> On Dec 22, 2010 5:03 PM, "Matthew Knepley" <knepley at gmail.com> wrote:
>
> On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen <yjxd.chen at gmail.com>
> wrote:
>
> >
> >
> >
> > On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >>
> >> On Wed, 22 De...
>
> 1) To see a large gain, the ordering you start with would have to be very
> bad. Maybe it is. These
>     orderings try to minimize bandwidth, which means minimize communication
> in the MatMult.
>
> 2) If you use incomplete facotrization, the ordering can have a large
> effect on conditioning, so
>     number of iterations, which does not improve scalability. This would
> impact scalability if you
>    use a parallel IC, however all those packages reorder your matrix
> already.
>
> In short, I suspect this will not help a lot, except maybe with
> conditioning, which is what I was refering to in the quote.
>
>     Matt
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more...
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101223/304e8830/attachment.htm>

From bsmith at mcs.anl.gov  Sat Dec 25 22:17:44 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 25 Dec 2010 22:17:44 -0600
Subject: [petsc-users] Using PETSc from MATLAB code, experimental
Message-ID: <28CA197A-02C8-41FD-BF2D-E344FBC3CF97@mcs.anl.gov>


  PETSc users,

   It is now possible to write MATLAB programs (sequential) that use PETSc KSP, SNES, and TS solvers directly in MATLAB. The code is still experimental and incomplete. But if you are interested in trying it out, get the development release of PETSc http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html join the development mailing list petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/miscellaneous/mailing-lists.html, read bin/matlab/classes/PetscInitialize.m, configure and make PETSc and join the fun. We are definitely in need of more developers for this code.

   Barry


From PRaeth at hpti.com  Mon Dec 27 09:09:18 2010
From: PRaeth at hpti.com (Raeth, Peter)
Date: Mon, 27 Dec 2010 15:09:18 +0000
Subject: [petsc-users] Meaning of MatStencil
Message-ID: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM>

Am a new PETSc user trying to make use of the matrix level to create and operate on matrices whose memory exceeds that available on any one node. To populate a distributed dense matrix with results of other matrix calculations we are trying to use MatSetValuesBlockedStencil. Two of the inputs to that function require structures of type MatStencil. After searching the archives, tutorials, examples, and Google, I can not find anything that explains what the values of MatStencil are meant to do relative to where in the target matrix to place the block of values. Would someone please point me in the right direction?

Thanks,

Peter.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101227/4081c2e3/attachment.htm>

From aron.ahmadia at kaust.edu.sa  Mon Dec 27 09:44:50 2010
From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia)
Date: Mon, 27 Dec 2010 10:44:50 -0500
Subject: [petsc-users] Meaning of MatStencil
In-Reply-To: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM>
References: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM>
Message-ID: <AANLkTin=hH_i8LAs8rwDYGoZHoGQBAmJ+8HuzrZuD1Hf@mail.gmail.com>

MatStencil only makes sense if you are using a distributed grid (DA), where
it corresponds to physical field locations.  You probably just want
MatSetValuesBlocked (
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html
)

Warm Regards,
Aron

On Mon, Dec 27, 2010 at 10:09 AM, Raeth, Peter <PRaeth at hpti.com> wrote:

>    Am a new PETSc user trying to make use of the matrix level to create
> and operate on matrices whose memory exceeds that available on any one node.
> To populate a distributed dense matrix with results of other matrix
> calculations we are trying to use MatSetValuesBlockedStencil. Two of the
> inputs to that function require structures of type MatStencil. After
> searching the archives, tutorials, examples, and Google, I can not find
> anything that explains what the values of MatStencil are meant to do
> relative to where in the target matrix to place the block of values.
> Would someone please point me in the right direction?
>
> Thanks,
>
> Peter.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101227/12ac82bf/attachment.htm>

From PRaeth at hpti.com  Mon Dec 27 09:47:25 2010
From: PRaeth at hpti.com (Raeth, Peter)
Date: Mon, 27 Dec 2010 15:47:25 +0000
Subject: [petsc-users] Meaning of MatStencil
In-Reply-To: <AANLkTin=hH_i8LAs8rwDYGoZHoGQBAmJ+8HuzrZuD1Hf@mail.gmail.com>
References: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM>,
	<AANLkTin=hH_i8LAs8rwDYGoZHoGQBAmJ+8HuzrZuD1Hf@mail.gmail.com>
Message-ID: <3474F869C1954540B771FD9CAEBCB65704A9A859@CORTINA.HPTI.COM>

AH !   Thanks very much Aron.


Best,

Peter.

________________________________
From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Aron Ahmadia [aron.ahmadia at kaust.edu.sa]
Sent: Monday, December 27, 2010 10:44 AM
To: PETSc users list
Subject: Re: [petsc-users] Meaning of MatStencil

MatStencil only makes sense if you are using a distributed grid (DA), where it corresponds to physical field locations.  You probably just want MatSetValuesBlocked (http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html)

Warm Regards,
Aron

On Mon, Dec 27, 2010 at 10:09 AM, Raeth, Peter <PRaeth at hpti.com<mailto:PRaeth at hpti.com>> wrote:
Am a new PETSc user trying to make use of the matrix level to create and operate on matrices whose memory exceeds that available on any one node. To populate a distributed dense matrix with results of other matrix calculations we are trying to use MatSetValuesBlockedStencil. Two of the inputs to that function require structures of type MatStencil. After searching the archives, tutorials, examples, and Google, I can not find anything that explains what the values of MatStencil are meant to do relative to where in the target matrix to place the block of values. Would someone please point me in the right direction?

Thanks,

Peter.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101227/c5e95539/attachment.htm>

From gdiso at ustc.edu  Wed Dec 29 23:00:19 2010
From: gdiso at ustc.edu (Gong Ding)
Date: Thu, 30 Dec 2010 13:00:19 +0800
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
Message-ID: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>

Dear all, 

I found the pastix solver   

petsc-3.1-p4
./configure --with-vendor-compilers=intel --with-blas-lapack-dir=/opt/intel/mkl/10.2.0.013/lib/em64t/ --download-pastix --download-scotch --with-shared=1 --with-debugging=1

For a poisson problem with symmetric matrix, pastix works well. 
However, for unsymmetric problem, the code break. valgrind reported that: 
Check : Sort CSC                OK
==4959== Invalid read of size 4
==4959==    at 0x1241931: PetscTrFreeDefault (mtr.c:280)
==4959==    by 0x150448A: MatConvertToCSC (pastix.c:188)
==4959==    by 0x1506638: MatFactorNumeric_PaStiX (pastix.c:390)
==4959==    by 0x13980AB: MatLUFactorNumeric (matrix.c:2587)
==4959==    by 0x16AB0A6: PCSetUp_LU (lu.c:158)
==4959==    by 0x1A9BD42: PCSetUp (precon.c:795)
==4959==    by 0x16FA8D0: KSPSetUp (itfunc.c:237)
==4959==    by 0x16FBB2A: KSPSolve (itfunc.c:353)
==4959==    by 0x17BEC6D: SNES_KSPSolve (snes.c:2944)
==4959==    by 0x17CEFEA: SNESSolve_LS (ls.c:191)
==4959==    by 0x17B8B78: SNESSolve (snes.c:2255)
==4959==    by 0x10B969D: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:820)
==4959==  Address 0x88e2f08 is 8 bytes inside a block of size 40 free'd
==4959==    at 0x4A05B16: operator delete(void*) (vg_replace_malloc.c:387)
==4959==    by 0xA42A53: __gnu_cxx::new_allocator<std::_Rb_tree_node<CTRI::Triangle*> >::deallocate(std::_Rb_tree_node<CTRI::Triangle*>*, unsigned long) (new_allocator.h:94)
==4959==    by 0xA41863: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::_M_put_node(std::_Rb_tree_node<CTRI::Triangle*>*) (stl_tree.h:362)
==4959==    by 0xA419C8: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::destroy_node(std::_Rb_tree_node<CTRI::Triangle*>*) (stl_tree.h:392)
==4959==    by 0xA42501: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(std::_Rb_tree_iterator<CTRI::Triangle*>) (stl_tree.h:1189)
==4959==    by 0xA4264A: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(std::_Rb_tree_iterator<CTRI::Triangle*>, std::_Rb_tree_iterator<CTRI::Triangle*>) (stl_tree.h:1281)
==4959==    by 0xA4257B: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(CTRI::Triangle* const&) (stl_tree.h:1215)
==4959==    by 0xA415E2: std::set<CTRI::Triangle*, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(CTRI::Triangle* const&) (stl_set.h:387)
==4959==    by 0xA3F6BD: CTRI::Triangle::~Triangle() (c_triangle.cc:109)
==4959==    by 0xA43F16: CTRI::TriMesh::~TriMesh() (c_trimesh.cc:163)
==4959==    by 0xA3F0C6: ctri_triangulate (c_tri_io.cc:35)
==4959==    by 0xD53FB9: MeshGeneratorTri3::do_refine(MeshRefinement&) (mesh_generation_tri3.cc:1369)
==4959==
[0]PETSC ERROR: PetscTrFreeDefault() called from MatConvertToCSC() line 188 in src/mat/impls/aij/mpi/pastix/pastix.c
[0]PETSC ERROR: Block at address 0x88e2ee0 is corrupted; cannot free;
may be block not allocated with PetscMalloc()

The test problems used to work well under other linear solvers such as MUMPS and superlu.
Does any meet this problem before?

Yours
Gong Ding


From PRaeth at hpti.com  Thu Dec 30 06:51:58 2010
From: PRaeth at hpti.com (Raeth, Peter)
Date: Thu, 30 Dec 2010 12:51:58 +0000
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
In-Reply-To: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
Message-ID: <3474F869C1954540B771FD9CAEBCB65704A9AA61@CORTINA.HPTI.COM>

Good morning Ding.

Sorry to not have an answer since I do not try to solve that type of problem. But, are the matrices formed and populated in a way that is appropriate to solution with PETSc? If that is the case then there does appear to be an issue with PETSc itself. I'll bet the developers would love to have your example.

My own development exercise is to compute the Kronecker tensor product where all components are distributed. Am using the Matrix and Vector level of PETSc, with some use of higher levels. It is a most interesting undertaking.


Best,

Peter.

From bsmith at mcs.anl.gov  Thu Dec 30 08:55:08 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 30 Dec 2010 08:55:08 -0600
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
In-Reply-To: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
Message-ID: <B3A2FD33-2422-49B2-BEF9-73ECACA95CC6@mcs.anl.gov>


  Please run program with -ksp_view_binary and send us the file binaryoutput to petsc-maint at mcs.anl.gov so we can reproduce the problem.


   Barry

On Dec 29, 2010, at 11:00 PM, Gong Ding wrote:

> Dear all, 
> 
> I found the pastix solver   
> 
> petsc-3.1-p4
> ./configure --with-vendor-compilers=intel --with-blas-lapack-dir=/opt/intel/mkl/10.2.0.013/lib/em64t/ --download-pastix --download-scotch --with-shared=1 --with-debugging=1
> 
> For a poisson problem with symmetric matrix, pastix works well. 
> However, for unsymmetric problem, the code break. valgrind reported that: 
> Check : Sort CSC                OK
> ==4959== Invalid read of size 4
> ==4959==    at 0x1241931: PetscTrFreeDefault (mtr.c:280)
> ==4959==    by 0x150448A: MatConvertToCSC (pastix.c:188)
> ==4959==    by 0x1506638: MatFactorNumeric_PaStiX (pastix.c:390)
> ==4959==    by 0x13980AB: MatLUFactorNumeric (matrix.c:2587)
> ==4959==    by 0x16AB0A6: PCSetUp_LU (lu.c:158)
> ==4959==    by 0x1A9BD42: PCSetUp (precon.c:795)
> ==4959==    by 0x16FA8D0: KSPSetUp (itfunc.c:237)
> ==4959==    by 0x16FBB2A: KSPSolve (itfunc.c:353)
> ==4959==    by 0x17BEC6D: SNES_KSPSolve (snes.c:2944)
> ==4959==    by 0x17CEFEA: SNESSolve_LS (ls.c:191)
> ==4959==    by 0x17B8B78: SNESSolve (snes.c:2255)
> ==4959==    by 0x10B969D: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:820)
> ==4959==  Address 0x88e2f08 is 8 bytes inside a block of size 40 free'd
> ==4959==    at 0x4A05B16: operator delete(void*) (vg_replace_malloc.c:387)
> ==4959==    by 0xA42A53: __gnu_cxx::new_allocator<std::_Rb_tree_node<CTRI::Triangle*> >::deallocate(std::_Rb_tree_node<CTRI::Triangle*>*, unsigned long) (new_allocator.h:94)
> ==4959==    by 0xA41863: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::_M_put_node(std::_Rb_tree_node<CTRI::Triangle*>*) (stl_tree.h:362)
> ==4959==    by 0xA419C8: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::destroy_node(std::_Rb_tree_node<CTRI::Triangle*>*) (stl_tree.h:392)
> ==4959==    by 0xA42501: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(std::_Rb_tree_iterator<CTRI::Triangle*>) (stl_tree.h:1189)
> ==4959==    by 0xA4264A: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(std::_Rb_tree_iterator<CTRI::Triangle*>, std::_Rb_tree_iterator<CTRI::Triangle*>) (stl_tree.h:1281)
> ==4959==    by 0xA4257B: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(CTRI::Triangle* const&) (stl_tree.h:1215)
> ==4959==    by 0xA415E2: std::set<CTRI::Triangle*, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(CTRI::Triangle* const&) (stl_set.h:387)
> ==4959==    by 0xA3F6BD: CTRI::Triangle::~Triangle() (c_triangle.cc:109)
> ==4959==    by 0xA43F16: CTRI::TriMesh::~TriMesh() (c_trimesh.cc:163)
> ==4959==    by 0xA3F0C6: ctri_triangulate (c_tri_io.cc:35)
> ==4959==    by 0xD53FB9: MeshGeneratorTri3::do_refine(MeshRefinement&) (mesh_generation_tri3.cc:1369)
> ==4959==
> [0]PETSC ERROR: PetscTrFreeDefault() called from MatConvertToCSC() line 188 in src/mat/impls/aij/mpi/pastix/pastix.c
> [0]PETSC ERROR: Block at address 0x88e2ee0 is corrupted; cannot free;
> may be block not allocated with PetscMalloc()
> 
> The test problems used to work well under other linear solvers such as MUMPS and superlu.
> Does any meet this problem before?
> 
> Yours
> Gong Ding
> 


From gdiso at ustc.edu  Thu Dec 30 20:44:20 2010
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 31 Dec 2010 10:44:20 +0800
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
	<B3A2FD33-2422-49B2-BEF9-73ECACA95CC6@mcs.anl.gov>
Message-ID: <3D98059130724B9FA0F5D2BB1F21C849@cogendaeda>


----- Original Message ----- 
From: "Barry Smith" <bsmith at mcs.anl.gov>
To: "PETSc users list" <petsc-users at mcs.anl.gov>
Sent: Thursday, December 30, 2010 10:55 PM
Subject: Re: [petsc-users] pastix solver break at pastix_checkMatrix


  Please run program with -ksp_view_binary and send us the file binaryoutput to petsc-maint at mcs.anl.gov so we can reproduce the problem.


   Barry


Ok, I use pastix as linear solver of SNES.  
(The coding style of pastix seems much better than mumps. I'd like to try if I can speed it by GPU-based BLAS.) 

The settings are listed as follows:
ierr = KSPSetType (ksp, (char*) KSPPREONLY); assert(!ierr);
ierr = PCSetType  (pc, (char*) PCLU); assert(!ierr);
ierr = PCFactorSetMatSolverPackage (pc, MAT_SOLVER_PASTIX); assert(!ierr);
ierr = PCFactorSetReuseFill(pc, PETSC_TRUE);assert(!ierr);
ierr = PCFactorSetReuseOrdering(pc, PETSC_TRUE); assert(!ierr);
ierr = PCFactorSetColumnPivot(pc, 1.0); genius_assert(!ierr);
//ierr = PCFactorReorderForNonzeroDiagonal(pc, 1e-20); assert(!ierr);<-- Caught signal number 11 SEGV error will occure when diag value < 1e-20
ierr = PCFactorSetShiftType(pc,MAT_SHIFT_NONZERO);assert(!ierr);

The attached matrix is dumped  at the end of jacobian assemble.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: break.mat
Type: application/octet-stream
Size: 208768 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101231/ddf534cc/attachment-0001.obj>

From bsmith at mcs.anl.gov  Thu Dec 30 20:45:25 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 30 Dec 2010 20:45:25 -0600
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
In-Reply-To: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
Message-ID: <E6A2C8C3-87DE-43BC-8295-F3EBD18651BB@mcs.anl.gov>


  Yes, there is an error in how we called one of the pastix routines. It required certain arrays be obtained with a raw malloc(). Please find attached a new src/mat/impls/aij/mpi/pastix/pastix.c put it in that location and run make in that directory.

   Thanks for reporting the problem and sending the valgrind output


   Barry


On Dec 29, 2010, at 11:00 PM, Gong Ding wrote:

> Dear all, 
> 
> I found the pastix solver   
> 
> petsc-3.1-p4
> ./configure --with-vendor-compilers=intel --with-blas-lapack-dir=/opt/intel/mkl/10.2.0.013/lib/em64t/ --download-pastix --download-scotch --with-shared=1 --with-debugging=1
> 
> For a poisson problem with symmetric matrix, pastix works well. 
> However, for unsymmetric problem, the code break. valgrind reported that: 
> Check : Sort CSC                OK
> ==4959== Invalid read of size 4
> ==4959==    at 0x1241931: PetscTrFreeDefault (mtr.c:280)
> ==4959==    by 0x150448A: MatConvertToCSC (pastix.c:188)
> ==4959==    by 0x1506638: MatFactorNumeric_PaStiX (pastix.c:390)
> ==4959==    by 0x13980AB: MatLUFactorNumeric (matrix.c:2587)
> ==4959==    by 0x16AB0A6: PCSetUp_LU (lu.c:158)
> ==4959==    by 0x1A9BD42: PCSetUp (precon.c:795)
> ==4959==    by 0x16FA8D0: KSPSetUp (itfunc.c:237)
> ==4959==    by 0x16FBB2A: KSPSolve (itfunc.c:353)
> ==4959==    by 0x17BEC6D: SNES_KSPSolve (snes.c:2944)
> ==4959==    by 0x17CEFEA: SNESSolve_LS (ls.c:191)
> ==4959==    by 0x17B8B78: SNESSolve (snes.c:2255)
> ==4959==    by 0x10B969D: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:820)
> ==4959==  Address 0x88e2f08 is 8 bytes inside a block of size 40 free'd
> ==4959==    at 0x4A05B16: operator delete(void*) (vg_replace_malloc.c:387)
> ==4959==    by 0xA42A53: __gnu_cxx::new_allocator<std::_Rb_tree_node<CTRI::Triangle*> >::deallocate(std::_Rb_tree_node<CTRI::Triangle*>*, unsigned long) (new_allocator.h:94)
> ==4959==    by 0xA41863: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::_M_put_node(std::_Rb_tree_node<CTRI::Triangle*>*) (stl_tree.h:362)
> ==4959==    by 0xA419C8: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::destroy_node(std::_Rb_tree_node<CTRI::Triangle*>*) (stl_tree.h:392)
> ==4959==    by 0xA42501: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(std::_Rb_tree_iterator<CTRI::Triangle*>) (stl_tree.h:1189)
> ==4959==    by 0xA4264A: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(std::_Rb_tree_iterator<CTRI::Triangle*>, std::_Rb_tree_iterator<CTRI::Triangle*>) (stl_tree.h:1281)
> ==4959==    by 0xA4257B: std::_Rb_tree<CTRI::Triangle*, CTRI::Triangle*, std::_Identity<CTRI::Triangle*>, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(CTRI::Triangle* const&) (stl_tree.h:1215)
> ==4959==    by 0xA415E2: std::set<CTRI::Triangle*, std::less<CTRI::Triangle*>, std::allocator<CTRI::Triangle*> >::erase(CTRI::Triangle* const&) (stl_set.h:387)
> ==4959==    by 0xA3F6BD: CTRI::Triangle::~Triangle() (c_triangle.cc:109)
> ==4959==    by 0xA43F16: CTRI::TriMesh::~TriMesh() (c_trimesh.cc:163)
> ==4959==    by 0xA3F0C6: ctri_triangulate (c_tri_io.cc:35)
> ==4959==    by 0xD53FB9: MeshGeneratorTri3::do_refine(MeshRefinement&) (mesh_generation_tri3.cc:1369)
> ==4959==
> [0]PETSC ERROR: PetscTrFreeDefault() called from MatConvertToCSC() line 188 in src/mat/impls/aij/mpi/pastix/pastix.c
> [0]PETSC ERROR: Block at address 0x88e2ee0 is corrupted; cannot free;
> may be block not allocated with PetscMalloc()
> 
> The test problems used to work well under other linear solvers such as MUMPS and superlu.
> Does any meet this problem before?
> 
> Yours
> Gong Ding
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastix.c
Type: application/octet-stream
Size: 27993 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101230/27e0a289/attachment.obj>

From gdiso at ustc.edu  Thu Dec 30 21:49:01 2010
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 31 Dec 2010 11:49:01 +0800
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
	<E6A2C8C3-87DE-43BC-8295-F3EBD18651BB@mcs.anl.gov>
Message-ID: <54B6E01485354315BF5C9040319E4F0C@cogendaeda>

Dear  Barry,
First, the patched file has some evident problem.

   PetscScalar *tmpvalues;
   PetscInt    *tmprows,*tmpcolptr;
    tmpvalues = malloc(nnz*sizeof(PetscScalar));
    tmprows   = malloc(nnz*sizeof(PetscInt));
    tmpcolptr = malloc((*n+1)*sizeof(PetscInt));

    ierr = PetscMalloc3(nnz,PetscScalar,&tmpvalues,nnz,PetscInt,&tmprows,(*n+1),PetscInt,&tmpcolptr);CHKERRQ(ierr);  <-- this line alloc meory again.

After comment above line, the pastix works for the first nonlinear iteration. However, it breaks at the second  iteration. valgrind reported:

DDM Solver Level 1 init...
Using PaStiX linear solver...
Compute equilibrium
 its    | Eq(V) | | Eq(n) | | Eq(p) | | Eq(T) | |Eq(Tn)|  |Eq(Tp)|  |delta x|
-----------------------------------------------------------------------------
  0     2.50e-06  2.34e-03  3.12e-04  0.00e+00* 0.00e+00* 0.00e+00* 0.00e+00*
Check : ordering                OK
Check : Graph Symmetry
         Correction
        Add 4090 null terms
                OK
Check : Sort CSC                OK
  1     2.06e-05  7.29e-04  1.03e-04  0.00e+00* 0.00e+00* 0.00e+00* 3.85e-01
Check : ordering                OK
Check : Graph Symmetry==1416== Thread 1:
==1416== Invalid read of size 4
==1416==    at 0x1BC3186: csc_checksym (csc_utils.c:321)
==1416==    by 0x1B4E7E3: pastix_checkMatrix (pastix.c:3915)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd
==1416==

         Correction==1416== Invalid read of size 4
==1416==    at 0x1BD1147: correct2 (cscsymcsc.c:77)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd
==1416==
==1416== Invalid read of size 4
==1416==    at 0x1BD10D7: correct2 (cscsymcsc.c:67)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
==1416==
==1416== Invalid read of size 4
==1416==    at 0x1BD10EB: correct2 (cscsymcsc.c:72)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
==1416==
==1416== Invalid read of size 4
==1416==    at 0x1BD1102: correct2 (cscsymcsc.c:75)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8d369dc is 4 bytes before a block of size 4,216 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x150812B: MatConvertToCSC (pastix.c:169)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
==1416==
==1416== Invalid read of size 4
==1416==    at 0x1BD116D: correct2 (cscsymcsc.c:88)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
==1416==
==1416== Invalid read of size 4
==1416==    at 0x1BD1179: correct2 (cscsymcsc.c:88)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x1BD1029: correct2 (cscsymcsc.c:53)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==
==1416== Invalid read of size 4
==1416==    at 0x1BD1186: correct2 (cscsymcsc.c:88)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
==1416==
==1416== Invalid write of size 4
==1416==    at 0x1BD1192: correct2 (cscsymcsc.c:88)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
==1416==  Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd
==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
==1416==    by 0x1BD1029: correct2 (cscsymcsc.c:53)
==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
==1416==
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below 

From bsmith at mcs.anl.gov  Thu Dec 30 22:01:05 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 30 Dec 2010 22:01:05 -0600
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
In-Reply-To: <54B6E01485354315BF5C9040319E4F0C@cogendaeda>
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda>
	<E6A2C8C3-87DE-43BC-8295-F3EBD18651BB@mcs.anl.gov>
	<54B6E01485354315BF5C9040319E4F0C@cogendaeda>
Message-ID: <2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov>

 
  Well that, obviously was erroneously left in. Remove that line, recompile in that directory and it should run ok.

   Barry


On Dec 30, 2010, at 9:49 PM, Gong Ding wrote:

> Dear  Barry,
> First, the patched file has some evident problem.
> 
>   PetscScalar *tmpvalues;
>   PetscInt    *tmprows,*tmpcolptr;
>    tmpvalues = malloc(nnz*sizeof(PetscScalar));
>    tmprows   = malloc(nnz*sizeof(PetscInt));
>    tmpcolptr = malloc((*n+1)*sizeof(PetscInt));
> 
>    ierr = PetscMalloc3(nnz,PetscScalar,&tmpvalues,nnz,PetscInt,&tmprows,(*n+1),PetscInt,&tmpcolptr);CHKERRQ(ierr);  <-- this line alloc meory again.
> 
> After comment above line, the pastix works for the first nonlinear iteration. However, it breaks at the second  iteration. valgrind reported:
> 
> DDM Solver Level 1 init...
> Using PaStiX linear solver...
> Compute equilibrium
> its    | Eq(V) | | Eq(n) | | Eq(p) | | Eq(T) | |Eq(Tn)|  |Eq(Tp)|  |delta x|
> -----------------------------------------------------------------------------
>  0     2.50e-06  2.34e-03  3.12e-04  0.00e+00* 0.00e+00* 0.00e+00* 0.00e+00*
> Check : ordering                OK
> Check : Graph Symmetry
>         Correction
>        Add 4090 null terms
>                OK
> Check : Sort CSC                OK
>  1     2.06e-05  7.29e-04  1.03e-04  0.00e+00* 0.00e+00* 0.00e+00* 3.85e-01
> Check : ordering                OK
> Check : Graph Symmetry==1416== Thread 1:
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BC3186: csc_checksym (csc_utils.c:321)
> ==1416==    by 0x1B4E7E3: pastix_checkMatrix (pastix.c:3915)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd
> ==1416==
> 
>         Correction==1416== Invalid read of size 4
> ==1416==    at 0x1BD1147: correct2 (cscsymcsc.c:77)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD10D7: correct2 (cscsymcsc.c:67)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD10EB: correct2 (cscsymcsc.c:72)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD1102: correct2 (cscsymcsc.c:75)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8d369dc is 4 bytes before a block of size 4,216 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x150812B: MatConvertToCSC (pastix.c:169)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD116D: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD1179: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x1BD1029: correct2 (cscsymcsc.c:53)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD1186: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid write of size 4
> ==1416==    at 0x1BD1192: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x1BD1029: correct2 (cscsymcsc.c:53)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below 


From gdiso at ustc.edu  Thu Dec 30 22:39:52 2010
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 31 Dec 2010 12:39:52 +0800
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda><E6A2C8C3-87DE-43BC-8295-F3EBD18651BB@mcs.anl.gov><54B6E01485354315BF5C9040319E4F0C@cogendaeda>
	<2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov>
Message-ID: <D00D608E5D1A4A59B3315AC1DF976287@cogendaeda>

Dear Barry
As I mentioned, only works for the first nonlinear iteration.
Will break for the second!
I guess the MatConvertToCSC function should be considered again, i.e. do optimization for nonlinear solver which will
call pastix many times.

Gong Ding


>  Well that, obviously was erroneously left in. Remove that line, recompile in that directory and it should run ok.

>   Barry


On Dec 30, 2010, at 9:49 PM, Gong Ding wrote:

> Dear  Barry,
> First, the patched file has some evident problem.
> 
>   PetscScalar *tmpvalues;
>   PetscInt    *tmprows,*tmpcolptr;
>    tmpvalues = malloc(nnz*sizeof(PetscScalar));
>    tmprows   = malloc(nnz*sizeof(PetscInt));
>    tmpcolptr = malloc((*n+1)*sizeof(PetscInt));
> 
>    ierr = PetscMalloc3(nnz,PetscScalar,&tmpvalues,nnz,PetscInt,&tmprows,(*n+1),PetscInt,&tmpcolptr);CHKERRQ(ierr);  <-- this line alloc meory again.
> 
> After comment above line, the pastix works for the first nonlinear iteration. However, it breaks at the second  iteration. valgrind reported:
> 
> DDM Solver Level 1 init...
> Using PaStiX linear solver...
> Compute equilibrium
> its    | Eq(V) | | Eq(n) | | Eq(p) | | Eq(T) | |Eq(Tn)|  |Eq(Tp)|  |delta x|
> -----------------------------------------------------------------------------
>  0     2.50e-06  2.34e-03  3.12e-04  0.00e+00* 0.00e+00* 0.00e+00* 0.00e+00*
> Check : ordering                OK
> Check : Graph Symmetry
>         Correction
>        Add 4090 null terms
>                OK
> Check : Sort CSC                OK
>  1     2.06e-05  7.29e-04  1.03e-04  0.00e+00* 0.00e+00* 0.00e+00* 3.85e-01
> Check : ordering                OK
> Check : Graph Symmetry==1416== Thread 1:
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BC3186: csc_checksym (csc_utils.c:321)
> ==1416==    by 0x1B4E7E3: pastix_checkMatrix (pastix.c:3915)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd
> ==1416==
> 
>         Correction==1416== Invalid read of size 4
> ==1416==    at 0x1BD1147: correct2 (cscsymcsc.c:77)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD10D7: correct2 (cscsymcsc.c:67)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD10EB: correct2 (cscsymcsc.c:72)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD1102: correct2 (cscsymcsc.c:75)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8d369dc is 4 bytes before a block of size 4,216 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x150812B: MatConvertToCSC (pastix.c:169)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD116D: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD1179: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x1BD1029: correct2 (cscsymcsc.c:53)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==
> ==1416== Invalid read of size 4
> ==1416==    at 0x1BD1186: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x15080FF: MatConvertToCSC (pastix.c:168)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==    by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824)
> ==1416==
> ==1416== Invalid write of size 4
> ==1416==    at 0x1BD1192: correct2 (cscsymcsc.c:88)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==    by 0x17BCF6C: SNESSolve (snes.c:2255)
> ==1416==  Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd
> ==1416==    at 0x4A061EF: malloc (vg_replace_malloc.c:236)
> ==1416==    by 0x1BD1029: correct2 (cscsymcsc.c:53)
> ==1416==    by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930)
> ==1416==    by 0x1508661: MatConvertToCSC (pastix.c:185)
> ==1416==    by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396)
> ==1416==    by 0x139C90B: MatLUFactorNumeric (matrix.c:2587)
> ==1416==    by 0x16AF49A: PCSetUp_LU (lu.c:158)
> ==1416==    by 0x1AA0136: PCSetUp (precon.c:795)
> ==1416==    by 0x16FECC4: KSPSetUp (itfunc.c:237)
> ==1416==    by 0x16FFF1E: KSPSolve (itfunc.c:353)
> ==1416==    by 0x17C3061: SNES_KSPSolve (snes.c:2944)
> ==1416==    by 0x17D33DE: SNESSolve_LS (ls.c:191)
> ==1416==
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below 


From gdiso at ustc.edu  Thu Dec 30 22:47:52 2010
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 31 Dec 2010 12:47:52 +0800
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda><E6A2C8C3-87DE-43BC-8295-F3EBD18651BB@mcs.anl.gov><54B6E01485354315BF5C9040319E4F0C@cogendaeda>
	<2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov>
Message-ID: <76538ECE086E482FB1366C668A1EA28A@cogendaeda>

Dear Barry
Sorry, I should complement that
when I set DIFFERENT_NONZERO_PATTERN to the MatStructure, pastix works well.
But it crash for SAME_NONZERO_PATTERN. 
However, I always use SAME_NONZERO_PATTERN before, which works for ksp solvers, mumps, superlu_dist, etc.

Yours
Gong Ding
 

From bsmith at mcs.anl.gov  Fri Dec 31 12:12:29 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 31 Dec 2010 12:12:29 -0600
Subject: [petsc-users] pastix solver break at pastix_checkMatrix
In-Reply-To: <76538ECE086E482FB1366C668A1EA28A@cogendaeda>
References: <C7B4E04F0000400CAAE14668C4B425B5@cogendaeda><E6A2C8C3-87DE-43BC-8295-F3EBD18651BB@mcs.anl.gov><54B6E01485354315BF5C9040319E4F0C@cogendaeda>
	<2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov>
	<76538ECE086E482FB1366C668A1EA28A@cogendaeda>
Message-ID: <AF0938F8-05AE-4C1A-883E-8F8BE655E881@mcs.anl.gov>


  Sorry. Yes there was another bug. What caused both of these problems is that Pastix requires a symmetric nonzero structure and the interface made some assumptions that the PETSc matrix had a symmetric nonzero structure which is usually true, hence it did not crash for most matrices people use.

   I've attached another copy of pastix.c follow the same procedure again.

   Barry


On Dec 30, 2010, at 10:47 PM, Gong Ding wrote:

> Dear Barry
> Sorry, I should complement that
> when I set DIFFERENT_NONZERO_PATTERN to the MatStructure, pastix works well.
> But it crash for SAME_NONZERO_PATTERN. 
> However, I always use SAME_NONZERO_PATTERN before, which works for ksp solvers, mumps, superlu_dist, etc.
> 
> Yours
> Gong Ding
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastix.c
Type: application/octet-stream
Size: 28044 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101231/54d5ca39/attachment.obj>

From pengxwang at hotmail.com  Fri Dec 31 12:23:12 2010
From: pengxwang at hotmail.com (Peter Wang)
Date: Fri, 31 Dec 2010 12:23:12 -0600
Subject: [petsc-users] the size of matrix in PETSc solver
Message-ID: <BAY128-W51F1C1A573B9402A3E563B4040@phx.gbl>


I am trying to build a big matrix to be solved by KSP.  I am wondering is there any limitation to the matrix size? What is the maximum size of the matrix in KSP? For example, if a 100 M by 100 M matrix can be handled by the KSP? Thanks a lot. 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101231/0069a252/attachment.htm>

From knepley at gmail.com  Fri Dec 31 12:28:05 2010
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 31 Dec 2010 12:28:05 -0600
Subject: [petsc-users] the size of matrix in PETSc solver
In-Reply-To: <BAY128-W51F1C1A573B9402A3E563B4040@phx.gbl>
References: <BAY128-W51F1C1A573B9402A3E563B4040@phx.gbl>
Message-ID: <AANLkTi=buMJ9bwTDUY-md=KLfsQGeD_P0SYP8P4sB=5V@mail.gmail.com>

On Fri, Dec 31, 2010 at 12:23 PM, Peter Wang <pengxwang at hotmail.com> wrote:

>  I am trying to build a big matrix to be solved by KSP.  I am wondering is
> there any limitation to the matrix size? What is the maximum size of the
> matrix in KSP? For example, if a 100 M by 100 M matrix can be handled by the
> KSP? Thanks a lot.
>

There are two limitations:

  a) Machine memory. To combat this, we recommend running in parallel

  b) Representation of row/col numbers by integers. If you have > 2B rows,
you will need to reconfigure
     using --with-64-bit-ints.

  Thanks,

     Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101231/0b4161c8/attachment-0001.htm>

From bsmith at mcs.anl.gov  Fri Dec 31 12:39:28 2010
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 31 Dec 2010 12:39:28 -0600
Subject: [petsc-users] the size of matrix in PETSc solver
In-Reply-To: <AANLkTi=buMJ9bwTDUY-md=KLfsQGeD_P0SYP8P4sB=5V@mail.gmail.com>
References: <BAY128-W51F1C1A573B9402A3E563B4040@phx.gbl>
	<AANLkTi=buMJ9bwTDUY-md=KLfsQGeD_P0SYP8P4sB=5V@mail.gmail.com>
Message-ID: <D307DDB5-7768-4586-BEDF-D5D426641167@mcs.anl.gov>


On Dec 31, 2010, at 12:28 PM, Matthew Knepley wrote:

> On Fri, Dec 31, 2010 at 12:23 PM, Peter Wang <pengxwang at hotmail.com> wrote:
> I am trying to build a big matrix to be solved by KSP.  I am wondering is there any limitation to the matrix size? What is the maximum size of the matrix in KSP? For example, if a 100 M by 100 M matrix can be handled by the KSP? Thanks a lot.
> 
> There are two limitations:
> 
>   a) Machine memory. To combat this, we recommend running in parallel
> 
>   b) Representation of row/col numbers by integers. If you have > 2B rows, you will need to reconfigure
>      using --with-64-bit-ints.

--with-64-bit-indices

> 
>   Thanks,
> 
>      Matt
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener