From lizs at mail.uc.edu Wed Dec 1 00:15:07 2010 From: lizs at mail.uc.edu (Li, Zhisong (lizs)) Date: Wed, 1 Dec 2010 06:15:07 +0000 Subject: [petsc-users] What's the binary function corresponding to "PetscViewerASCIIPrintf"? Message-ID: <88D7E3BB7E1960428303E7601003745142E57DB3@BL2PRD0103MB055.prod.exchangelabs.com> Hi, Petsc Team, I once read that when we write large volume of data into an output data file, it's best to write in binary format rather than in ASCII format. I forget the origin of this statement and could not find it any more. What I found in the example code is about writing array into a binary file. Besides data stored in arrays, I also want to include some other info such as document title, data dimensions and line changing, which are not arrays, into the output file. So a "PetscViewerASCIIPrintf" is very convenient for doing this if we use ASCII format. But I could not find its correspondent function for binary format. Can you tell me how to do this? A sample code will be most helpful. Best Regards, Zhisong Li -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Dec 1 08:06:19 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 1 Dec 2010 08:06:19 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: References: Message-ID: <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov> --with-64-bit-indices=1 You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) ^^^^ ^^^^^ --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example PetscInt mone mone = 1 > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) but better just build PETSc without the --with-64-bit-indices Barry On Nov 30, 2010, at 10:33 PM, Peter Wang wrote: > I am trying to create a matrix and insert values to it. The martix is supposed to be as following: > > 1 0 0 0 > 0 2 0 0 > 0 0 3 0 > 0 0 0 4 > > array coef[] is the diagonal value of the matrix, > snr[] is the index of the row, rnr[] is the index of column. > > However, I always get the wrong results. It shows the Column too large: col 4607182418800017408 max 3! I cheked the value of rnr[]. The output snr and rnr is correct: > snr= 0 1 2 3 > rnr= 0 1 2 3 > > It seems there is something wrong when MatSetValues() is called. Following is a part of the error information. The information is shown at each loop of do II=Istart,Iend-1 > > The output (if any) follows: > snr= 0 1 2 3 > rnr= 0 1 2 3 > 8.....Check after MatGetOwnershipRange() Istart= 0 Iend= 4 > II= 0 1 0 0 > [0]PETSC ERROR: --------------------- Error Message ------------------------------------ > [0]PETSC ERROR: Argument out of range! > [0]PETSC ERROR: Column too large: col 4607182418800017408 max 3! > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Debug_PETSc_MatCreate_20101130 on a linux-gnu named compute-1-35.hpc.local.uwm by pwang_a Tue Nov 30 22:27:03 2010 > [0]PETSC ERROR: Libraries linked from /sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1/lib > [0]PETSC ERROR: Configure run at Fri Oct 8 12:59:16 2010 > [0]PETSC ERROR: Configure options --prefix=/sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1 --with-mpi-dir=/sharedapps/uwm/common/gcc-4.4.3/openmpi/1.3.2-v1 --with-blas-lapack-dir=/sharedapps/uwm/ceas/gcc-4.4.3/lapack/3.2.2-v1/lib --with-64-bit-indices=1 --with-64-bit-pointers=1 --with-large-file-io=1 --with-x=0 > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: MatSetValues_SeqAIJ() line 193 in src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatSetValues() line 992 in src/mat/interface/matrix.c > > > > !The code is as following: > !============================= > program Debug_PETSc_MatCreate_20101130 > implicit none > ! > #include "finclude/petscsys.h" > #include "finclude/petscvec.h" > #include "finclude/petscmat.h" > #include "finclude/petscpc.h" > #include "finclude/petscksp.h" > ! Variables > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! PETSc Variables > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > real*8 norm > PetscInt i,j,II,JJ,its !,m,n > PetscInt Istart,Iend,ione > PetscErrorCode ierr > PetscMPIInt myid,numprocs > PetscTruth flg > PetscScalar v,one,neg_one > Vec x,b,u > Mat A_petsc > KSP ksp > PetscInt,parameter:: n_nz=4 > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Other Variables > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > !parameter::n_nz=4 > Real*8::Coef(n_nz) > PetscInt::snr(n_nz),rnr(n_nz) > data Coef /1., 2., 3. , 4./ > data snr /0, 1, 2, 3/ > data rnr /0, 1 , 2, 3/ > ! Body of Debug_PETSc_MatCreate_20101130 > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Beginning of program > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) > write(*,"('snr=',4i4)")snr > write(*,"('rnr=',4i4)")rnr > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? > call MatSetFromOptions(A_Petsc,ierr) > ! write(*,*)A_petsc > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) > > write(*,'(1a,1i7,1a,1i7)') & > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend > do II=Istart,Iend-1 > ione=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > write(*,'(1a,4i7)')'II=',II,ione,snr(ione),rnr(ione) !output snr and rnr for error check > call MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione),INSERT_VALUES,ierr) > enddo > > write(*,'(1a)')'9.....Check after MatSetValues()' > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) > write(*,'(1a)')'10.....Check after MatCreate()' > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) > ! call KSPDestroy(ksp,ierr) > ! call VecDestroy(u,ierr) > ! call VecDestroy(x,ierr) > ! call VecDestroy(b,ierr) > call MatDestroy(A_petsc,ierr) > call PetscFinalize(ierr) > end program Debug_PETSc_MatCreate_20101130 > !===================================== From bsmith at mcs.anl.gov Wed Dec 1 08:12:27 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 1 Dec 2010 08:12:27 -0600 Subject: [petsc-users] What's the binary function corresponding to "PetscViewerASCIIPrintf"? In-Reply-To: <88D7E3BB7E1960428303E7601003745142E57DB3@BL2PRD0103MB055.prod.exchangelabs.com> References: <88D7E3BB7E1960428303E7601003745142E57DB3@BL2PRD0103MB055.prod.exchangelabs.com> Message-ID: You can write raw strings into binary viewers with PetscViewerBinaryWrite(), BUT you will need to have your program that processes the data in the binary file be able to read out the strings. Another alternative is to use PetscViewerBinaryGetInfoPointer() and write your additional information into the corresponding .info ASCII file that is automatically generated. Again whatever program that processes your data will need to read that file to get the information out of it. Barry On Dec 1, 2010, at 12:15 AM, Li, Zhisong (lizs) wrote: > Hi, Petsc Team, > > > I once read that when we write large volume of data into an output data file, it's best to write in binary format rather than in ASCII format. I forget the origin of this statement and could not find it any more. What I found in the example code is about writing array into a binary file. Besides data stored in arrays, I also want to include some other info such as document title, data dimensions and line changing, which are not arrays, into the output file. So a "PetscViewerASCIIPrintf" is very convenient for doing this if we use ASCII format. But I could not find its correspondent function for binary format. Can you tell me how to do this? A sample code will be most helpful. > > > Best Regards, > > Zhisong Li > From pengxwang at hotmail.com Wed Dec 1 15:44:16 2010 From: pengxwang at hotmail.com (Peter Wang) Date: Wed, 1 Dec 2010 15:44:16 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov> References: , <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov> Message-ID: Thanks, I changed the '1' to PetscInt ione, However, the error still comes out. do II=Istart,Iend-1 mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz) ^^ ^^^ enddo BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? !===The modified code is == program Debug_PETSc_MatCreate_20101130 implicit none ! #include "finclude/petscsys.h" #include "finclude/petscvec.h" #include "finclude/petscmat.h" #include "finclude/petscpc.h" #include "finclude/petscksp.h" ! Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! PETSc Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - real*8 norm PetscInt i,j,II,JJ,its !,m,n PetscInt Istart,Iend,ione,mone PetscErrorCode ierr PetscMPIInt myid,numprocs PetscTruth flg PetscScalar v,one,neg_one Vec x,b,u Mat A_petsc KSP ksp PetscInt,parameter::n_nz=4 !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Other Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - PetscInt::snr(n_nz),rnr(n_nz) !parameter::n_nz=4 PetscReal::Coef(n_nz) data Coef /1., 2., 3. , 4./ data snr /0, 1, 2, 3/ data rnr /0, 1 , 2, 3/ ! Body of Debug_PETSc_MatCreate_20101130 ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Beginning of program ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - call PetscInitialize(PETSC_NULL_CHARACTER,ierr) call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) write(*,"('snr=',4i4)")snr write(*,"('rnr=',4i4)")rnr call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? call MatSetFromOptions(A_Petsc,ierr) ! write(*,*)A_petsc call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) write(*,'(1a,1i7,1a,1i7)') & '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend do II=Istart,Iend-1 mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) enddo write(*,'(1a)')'9.....Check after MatSetValues()' call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) write(*,'(1a)')'10.....Check after MatCreate()' call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) ! call KSPDestroy(ksp,ierr) ! call VecDestroy(u,ierr) ! call VecDestroy(x,ierr) ! call VecDestroy(b,ierr) call MatDestroy(A_petsc,ierr) call PetscFinalize(ierr) end program Debug_PETSc_MatCreate_20101130 > From: bsmith at mcs.anl.gov > Date: Wed, 1 Dec 2010 08:06:19 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] column index in MatSetValues() > > > > --with-64-bit-indices=1 > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) > ^^^^ ^^^^^ > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example > > PetscInt mone > mone = 1 > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) > > but better just build PETSc without the --with-64-bit-indices > > Barry > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Dec 1 15:48:47 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 1 Dec 2010 15:48:47 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: References: , <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov> Message-ID: <770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov> Humm, the problem is still very likely related to a miss-match between 4 byte and 8 byte integers. You should just install PETSc yourself (then you have control over it, giving control to someone else whenever doing scientific computing is always dangerous). Installing PETSc is usually no big deal. If you have problems send configure.log and make.log to petsc-maint at mcs.anl.gov Barry On Dec 1, 2010, at 3:44 PM, Peter Wang wrote: > Thanks, > > I changed the '1' to PetscInt ione, However, the error still comes out. > > do II=Istart,Iend-1 > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz) > ^^ ^^^ > enddo > > > BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? > > !===The modified code is == > program Debug_PETSc_MatCreate_20101130 > implicit none > ! > #include "finclude/petscsys.h" > #include "finclude/petscvec.h" > #include "finclude/petscmat.h" > #include "finclude/petscpc.h" > #include "finclude/petscksp.h" > ! Variables > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! PETSc Variables > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > real*8 norm > PetscInt i,j,II,JJ,its !,m,n > PetscInt Istart,Iend,ione,mone > PetscErrorCode ierr > PetscMPIInt myid,numprocs > PetscTruth flg > PetscScalar v,one,neg_one > Vec x,b,u > Mat A_petsc > KSP ksp > PetscInt,parameter::n_nz=4 > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Other Variables > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > PetscInt::snr(n_nz),rnr(n_nz) > !parameter::n_nz=4 > PetscReal::Coef(n_nz) > data Coef /1., 2., 3. , 4./ > data snr /0, 1, 2, 3/ > data rnr /0, 1 , 2, 3/ > ! Body of Debug_PETSc_MatCreate_20101130 > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Beginning of program > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) > write(*,"('snr=',4i4)")snr > write(*,"('rnr=',4i4)")rnr > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? > call MatSetFromOptions(A_Petsc,ierr) > ! write(*,*)A_petsc > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) > > write(*,'(1a,1i7,1a,1i7)') & > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend > do II=Istart,Iend-1 > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) > enddo > > write(*,'(1a)')'9.....Check after MatSetValues()' > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) > write(*,'(1a)')'10.....Check after MatCreate()' > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) > ! call KSPDestroy(ksp,ierr) > ! call VecDestroy(u,ierr) > ! call VecDestroy(x,ierr) > ! call VecDestroy(b,ierr) > call MatDestroy(A_petsc,ierr) > call PetscFinalize(ierr) > end program Debug_PETSc_MatCreate_20101130 > > > > From: bsmith at mcs.anl.gov > > Date: Wed, 1 Dec 2010 08:06:19 -0600 > > To: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] column index in MatSetValues() > > > > > > > > --with-64-bit-indices=1 > > > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > > > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) > > ^^^^ ^^^^^ > > > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example > > > > PetscInt mone > > mone = 1 > > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) > > > > but better just build PETSc without the --with-64-bit-indices > > > > Barry > > > > From pengxwang at hotmail.com Thu Dec 2 11:20:36 2010 From: pengxwang at hotmail.com (Peter Wang) Date: Thu, 2 Dec 2010 11:20:36 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: References: , <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>, Message-ID: Thanks, I changed the '1' to PetscInt ione, However, the error still comes out. do II=Istart,Iend-1 mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz) ^^ ^^^ enddo BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? !===The modified code is == program Debug_PETSc_MatCreate_20101130 implicit none ! #include "finclude/petscsys.h" #include "finclude/petscvec.h" #include "finclude/petscmat.h" #include "finclude/petscpc.h" #include "finclude/petscksp.h" ! Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! PETSc Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - real*8 norm PetscInt i,j,II,JJ,its !,m,n PetscInt Istart,Iend,ione,mone PetscErrorCode ierr PetscMPIInt myid,numprocs PetscTruth flg PetscScalar v,one,neg_one Vec x,b,u Mat A_petsc KSP ksp PetscInt,parameter::n_nz=4 !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Other Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - PetscInt::snr(n_nz),rnr(n_nz) !parameter::n_nz=4 PetscReal::Coef(n_nz) data Coef /1., 2., 3. , 4./ data snr /0, 1, 2, 3/ data rnr /0, 1 , 2, 3/ ! Body of Debug_PETSc_MatCreate_20101130 ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Beginning of program ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - call PetscInitialize(PETSC_NULL_CHARACTER,ierr) call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) write(*,"('snr=',4i4)")snr write(*,"('rnr=',4i4)")rnr call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? call MatSetFromOptions(A_Petsc,ierr) ! write(*,*)A_petsc call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) write(*,'(1a,1i7,1a,1i7)') & '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend do II=Istart,Iend-1 mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) enddo write(*,'(1a)')'9.....Check after MatSetValues()' call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) write(*,'(1a)')'10.....Check after MatCreate()' call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) ! call KSPDestroy(ksp,ierr) ! call VecDestroy(u,ierr) ! call VecDestroy(x,ierr) ! call VecDestroy(b,ierr) call MatDestroy(A_petsc,ierr) call PetscFinalize(ierr) end program Debug_PETSc_MatCreate_20101130 > From: bsmith at mcs.anl.gov > Date: Wed, 1 Dec 2010 08:06:19 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] column index in MatSetValues() > > > > --with-64-bit-indices=1 > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) > ^^^^ ^^^^^ > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example > > PetscInt mone > mone = 1 > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) > > but better just build PETSc without the --with-64-bit-indices > > Barry > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengxwang at hotmail.com Thu Dec 2 12:17:37 2010 From: pengxwang at hotmail.com (Peter Wang) Date: Thu, 2 Dec 2010 12:17:37 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov> References: , <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov> Message-ID: I sent two emails for replying thie topic. However, I didn't get the email of myself from petsc-users-bounces at mcs.anl.gov . I am wondering if the email system has something wrong? Sorry if the resent email bohters anyone. In the new version of the code I defined PetscInt II,JJ,ione, mone,snr[] and rnr[], PetscReal coef[], and modified the following portion. However, the error is still there. Is there any ohter reason I didn't figure out? BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? ! ====the modified loop======= do I=Istart,Iend-1 mone=I+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) II=snr(mone) JJ=rnr(mone) v=coef(mone) write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) call MatSetValues(A_Petsc,ione,II,ione,JJ,v,INSERT_VALUES,ierr) enddo ! ============the whole program code modified==================== program Debug_PETSc_MatCreate_20101130 implicit none ! #include "finclude/petscsys.h" #include "finclude/petscvec.h" #include "finclude/petscmat.h" #include "finclude/petscpc.h" #include "finclude/petscksp.h" ! Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! PETSc Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - real*8 norm PetscInt i,j,II,JJ,its !,m,n PetscInt Istart,Iend,ione,mone PetscErrorCode ierr PetscMPIInt myid,numprocs PetscTruth flg PetscScalar v,one,neg_one Vec x,b,u Mat A_petsc KSP ksp PetscInt,parameter::n_nz=4 !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Other Variables !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - PetscInt::snr(n_nz),rnr(n_nz) !parameter::n_nz=4 PetscReal::Coef(n_nz) data Coef /1., 2., 3. , 4./ data snr /0, 1, 2, 3/ data rnr /0, 1 , 2, 3/ ! Body of Debug_PETSc_MatCreate_20101130 ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Beginning of program ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - call PetscInitialize(PETSC_NULL_CHARACTER,ierr) call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) write(*,"('snr=',4i4)")snr write(*,"('rnr=',4i4)")rnr call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? call MatSetFromOptions(A_Petsc,ierr) ! write(*,*)A_petsc call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) write(*,'(1a,1i7,1a,1i7)') & '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend do I=Istart,Iend-1 mone=I+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) II=snr(mone) JJ=rnr(mone) v=coef(mone) write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) call MatSetValues(A_Petsc,ione,II,ione,JJ,v,INSERT_VALUES,ierr) enddo write(*,'(1a)')'9.....Check after MatSetValues()' call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) write(*,'(1a)')'10.....Check after MatCreate()' call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) ! call KSPDestroy(ksp,ierr) ! call VecDestroy(u,ierr) ! call VecDestroy(x,ierr) ! call VecDestroy(b,ierr) call MatDestroy(A_petsc,ierr) call PetscFinalize(ierr) end program Debug_PETSc_MatCreate_20101130 !===================End of the code============================ > From: bsmith at mcs.anl.gov > Date: Wed, 1 Dec 2010 08:06:19 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] column index in MatSetValues() > > > > --with-64-bit-indices=1 > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) > ^^^^ ^^^^^ > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example > > PetscInt mone > mone = 1 > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) > > but better just build PETSc without the --with-64-bit-indices > > Barry > > > On Nov 30, 2010, at 10:33 PM, Peter Wang wrote: > > > I am trying to create a matrix and insert values to it. The martix is supposed to be as following: > > > > 1 0 0 0 > > 0 2 0 0 > > 0 0 3 0 > > 0 0 0 4 > > > > array coef[] is the diagonal value of the matrix, > > snr[] is the index of the row, rnr[] is the index of column. > > > > However, I always get the wrong results. It shows the Column too large: col 4607182418800017408 max 3! I cheked the value of rnr[]. The output snr and rnr is correct: > > snr= 0 1 2 3 > > rnr= 0 1 2 3 > > > > It seems there is something wrong when MatSetValues() is called. Following is a part of the error information. The information is shown at each loop of do II=Istart,Iend-1 > > > > The output (if any) follows: > > snr= 0 1 2 3 > > rnr= 0 1 2 3 > > 8.....Check after MatGetOwnershipRange() Istart= 0 Iend= 4 > > II= 0 1 0 0 > > [0]PETSC ERROR: --------------------- Error Message ------------------------------------ > > [0]PETSC ERROR: Argument out of range! > > [0]PETSC ERROR: Column too large: col 4607182418800017408 max 3! > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > > [0]PETSC ERROR: See docs/index.html for manual pages. > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > [0]PETSC ERROR: Debug_PETSc_MatCreate_20101130 on a linux-gnu named compute-1-35.hpc.local.uwm by pwang_a Tue Nov 30 22:27:03 2010 > > [0]PETSC ERROR: Libraries linked from /sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1/lib > > [0]PETSC ERROR: Configure run at Fri Oct 8 12:59:16 2010 > > [0]PETSC ERROR: Configure options --prefix=/sharedapps/uwm/ceas/gcc-4.4.3/petsc/3.1-p5-v1 --with-mpi-dir=/sharedapps/uwm/common/gcc-4.4.3/openmpi/1.3.2-v1 --with-blas-lapack-dir=/sharedapps/uwm/ceas/gcc-4.4.3/lapack/3.2.2-v1/lib --with-64-bit-indices=1 --with-64-bit-pointers=1 --with-large-file-io=1 --with-x=0 > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > [0]PETSC ERROR: MatSetValues_SeqAIJ() line 193 in src/mat/impls/aij/seq/aij.c > > [0]PETSC ERROR: MatSetValues() line 992 in src/mat/interface/matrix.c > > > > > > > > !The code is as following: > > !============================= > > program Debug_PETSc_MatCreate_20101130 > > implicit none > > ! > > #include "finclude/petscsys.h" > > #include "finclude/petscvec.h" > > #include "finclude/petscmat.h" > > #include "finclude/petscpc.h" > > #include "finclude/petscksp.h" > > ! Variables > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! PETSc Variables > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > real*8 norm > > PetscInt i,j,II,JJ,its !,m,n > > PetscInt Istart,Iend,ione > > PetscErrorCode ierr > > PetscMPIInt myid,numprocs > > PetscTruth flg > > PetscScalar v,one,neg_one > > Vec x,b,u > > Mat A_petsc > > KSP ksp > > PetscInt,parameter:: n_nz=4 > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! Other Variables > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > !parameter::n_nz=4 > > Real*8::Coef(n_nz) > > PetscInt::snr(n_nz),rnr(n_nz) > > data Coef /1., 2., 3. , 4./ > > data snr /0, 1, 2, 3/ > > data rnr /0, 1 , 2, 3/ > > ! Body of Debug_PETSc_MatCreate_20101130 > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! Beginning of program > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) > > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) > > write(*,"('snr=',4i4)")snr > > write(*,"('rnr=',4i4)")rnr > > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) > > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? > > call MatSetFromOptions(A_Petsc,ierr) > > ! write(*,*)A_petsc > > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) > > > > write(*,'(1a,1i7,1a,1i7)') & > > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend > > do II=Istart,Iend-1 > > ione=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > > write(*,'(1a,4i7)')'II=',II,ione,snr(ione),rnr(ione) !output snr and rnr for error check > > call MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione),INSERT_VALUES,ierr) > > enddo > > > > write(*,'(1a)')'9.....Check after MatSetValues()' > > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) > > write(*,'(1a)')'10.....Check after MatCreate()' > > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) > > ! call KSPDestroy(ksp,ierr) > > ! call VecDestroy(u,ierr) > > ! call VecDestroy(x,ierr) > > ! call VecDestroy(b,ierr) > > call MatDestroy(A_petsc,ierr) > > call PetscFinalize(ierr) > > end program Debug_PETSc_MatCreate_20101130 > > !===================================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengxwang at hotmail.com Thu Dec 2 14:02:00 2010 From: pengxwang at hotmail.com (Peter Wang) Date: Thu, 2 Dec 2010 14:02:00 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: <770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov> References: , , <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>, , <770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov> Message-ID: Thanks, Dr. Simth, If the PETSc can be installed on the supurcomputer by me? The software is currently installed by the network manager of the supercomputer. The example code : ex2f.F from http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-2.3.3/src/ksp/ksp/examples/tutorials/ex2f.F is compiled and run on the same supercomputer. There is no error coming out when ex2f.F runs. I am just trying to implement my own matrix into the code. The variation in my code is that the indices of the matrix is arrays, while that in example code is II and JJ. However, in the latest version of my code, I already assigned the arrays of indices to PetscInt II,JJ. Unfortunately, the error still comes out. It's kind of confusing that the own coded program just doen't work well. Thanks for your suggestion. in> From: bsmith at mcs.anl.gov > Date: Wed, 1 Dec 2010 15:48:47 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] column index in MatSetValues() > > > Humm, the problem is still very likely related to a miss-match between 4 byte and 8 byte integers. > > You should just install PETSc yourself (then you have control over it, giving control to someone else whenever doing scientific computing is always dangerous). > > Installing PETSc is usually no big deal. If you have problems send configure.log and make.log to petsc-maint at mcs.anl.gov > > > Barry > > On Dec 1, 2010, at 3:44 PM, Peter Wang wrote: > > > Thanks, > > > > I changed the '1' to PetscInt ione, However, the error still comes out. > > > > do II=Istart,Iend-1 > > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) > > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz) > > ^^ ^^^ > > enddo > > > > > > BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? > > > > !===The modified code is == > > program Debug_PETSc_MatCreate_20101130 > > implicit none > > ! > > #include "finclude/petscsys.h" > > #include "finclude/petscvec.h" > > #include "finclude/petscmat.h" > > #include "finclude/petscpc.h" > > #include "finclude/petscksp.h" > > ! Variables > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! PETSc Variables > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > real*8 norm > > PetscInt i,j,II,JJ,its !,m,n > > PetscInt Istart,Iend,ione,mone > > PetscErrorCode ierr > > PetscMPIInt myid,numprocs > > PetscTruth flg > > PetscScalar v,one,neg_one > > Vec x,b,u > > Mat A_petsc > > KSP ksp > > PetscInt,parameter::n_nz=4 > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! Other Variables > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > PetscInt::snr(n_nz),rnr(n_nz) > > !parameter::n_nz=4 > > PetscReal::Coef(n_nz) > > data Coef /1., 2., 3. , 4./ > > data snr /0, 1, 2, 3/ > > data rnr /0, 1 , 2, 3/ > > ! Body of Debug_PETSc_MatCreate_20101130 > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! Beginning of program > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) > > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) > > write(*,"('snr=',4i4)")snr > > write(*,"('rnr=',4i4)")rnr > > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) > > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? > > call MatSetFromOptions(A_Petsc,ierr) > > ! write(*,*)A_petsc > > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) > > > > write(*,'(1a,1i7,1a,1i7)') & > > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend > > do II=Istart,Iend-1 > > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) > > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) > > enddo > > > > write(*,'(1a)')'9.....Check after MatSetValues()' > > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) > > write(*,'(1a)')'10.....Check after MatCreate()' > > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) > > ! call KSPDestroy(ksp,ierr) > > ! call VecDestroy(u,ierr) > > ! call VecDestroy(x,ierr) > > ! call VecDestroy(b,ierr) > > call MatDestroy(A_petsc,ierr) > > call PetscFinalize(ierr) > > end program Debug_PETSc_MatCreate_20101130 > > > > > > > From: bsmith at mcs.anl.gov > > > Date: Wed, 1 Dec 2010 08:06:19 -0600 > > > To: petsc-users at mcs.anl.gov > > > Subject: Re: [petsc-users] column index in MatSetValues() > > > > > > > > > > > > --with-64-bit-indices=1 > > > > > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > > > > > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) > > > ^^^^ ^^^^^ > > > > > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example > > > > > > PetscInt mone > > > mone = 1 > > > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) > > > > > > but better just build PETSc without the --with-64-bit-indices > > > > > > Barry > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Dec 2 14:12:34 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 2 Dec 2010 14:12:34 -0600 Subject: [petsc-users] column index in MatSetValues() In-Reply-To: References: , , <3B6954F4-6AF1-4D6B-88E9-354F0527C783@mcs.anl.gov>, , <770AD023-D593-4399-9A27-D45481EED555@mcs.anl.gov> Message-ID: <942B3EDF-32D8-42DE-94EF-9304EAD344E8@mcs.anl.gov> On Dec 2, 2010, at 2:02 PM, Peter Wang wrote: > Thanks, Dr. Simth, > > If the PETSc can be installed on the supurcomputer by me? Yes, PETSc is just a library of source code. Anyone can install it,. http://www.mcs.anl.gov/petsc/petsc-as/documentation/installation.html Barry > The software is currently installed by the network manager of the supercomputer. > > The example code : ex2f.F from http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-2.3.3/src/ksp/ksp/examples/tutorials/ex2f.F is compiled and run on the same supercomputer. > There is no error coming out when ex2f.F runs. I am just trying to implement my own matrix into the code. The variation in my code is that the indices of the matrix is arrays, while that in example code is II and JJ. However, in the latest version of my code, I already assigned the arrays of indices to PetscInt II,JJ. Unfortunately, the error still comes out. It's kind of confusing that the own coded program just doen't work well. > > Thanks for your suggestion. > > > in> From: bsmith at mcs.anl.gov > > Date: Wed, 1 Dec 2010 15:48:47 -0600 > > To: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] column index in MatSetValues() > > > > > > Humm, the problem is still very likely related to a miss-match between 4 byte and 8 byte integers. > > > > You should just install PETSc yourself (then you have control over it, giving control to someone else whenever doing scientific computing is always dangerous). > > > > Installing PETSc is usually no big deal. If you have problems send configure.log and make.log to petsc-maint at mcs.anl.gov > > > > > > Barry > > > > On Dec 1, 2010, at 3:44 PM, Peter Wang wrote: > > > > > Thanks, > > > > > > I changed the '1' to PetscInt ione, However, the error still comes out. > > > > > > do II=Istart,Iend-1 > > > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > > > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) > > > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) ! PetscInt ione and mone; PetscInt snr(n_nz),rnr(n_nz) PetscReal Coef(n_nz) > > > ^^ ^^^ > > > enddo > > > > > > > > > BTW, I am running the code on the clusters of supurcomputer. Where the option ' --with-64-bit-indices=1' shold I find and remove? > > > > > > !===The modified code is == > > > program Debug_PETSc_MatCreate_20101130 > > > implicit none > > > ! > > > #include "finclude/petscsys.h" > > > #include "finclude/petscvec.h" > > > #include "finclude/petscmat.h" > > > #include "finclude/petscpc.h" > > > #include "finclude/petscksp.h" > > > ! Variables > > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > > ! PETSc Variables > > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > > real*8 norm > > > PetscInt i,j,II,JJ,its !,m,n > > > PetscInt Istart,Iend,ione,mone > > > PetscErrorCode ierr > > > PetscMPIInt myid,numprocs > > > PetscTruth flg > > > PetscScalar v,one,neg_one > > > Vec x,b,u > > > Mat A_petsc > > > KSP ksp > > > PetscInt,parameter::n_nz=4 > > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > > ! Other Variables > > > !- - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > > PetscInt::snr(n_nz),rnr(n_nz) > > > !parameter::n_nz=4 > > > PetscReal::Coef(n_nz) > > > data Coef /1., 2., 3. , 4./ > > > data snr /0, 1, 2, 3/ > > > data rnr /0, 1 , 2, 3/ > > > ! Body of Debug_PETSc_MatCreate_20101130 > > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > > ! Beginning of program > > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > > call MPI_Comm_rank(PETSC_COMM_WORLD,myid,ierr) > > > call MPI_Comm_size(PETSC_COMM_WORLD,numprocs,ierr) > > > write(*,"('snr=',4i4)")snr > > > write(*,"('rnr=',4i4)")rnr > > > call MatCreate(PETSC_COMM_WORLD,A_Petsc,ierr) > > > call MatSetSizes(A_Petsc,PETSC_DECIDE,PETSC_DECIDE,n_nz,n_nz,ierr) !n_nz-1??? > > > call MatSetFromOptions(A_Petsc,ierr) > > > ! write(*,*)A_petsc > > > call MatGetOwnershipRange(A_Petsc,Istart,Iend,ierr) > > > > > > write(*,'(1a,1i7,1a,1i7)') & > > > '8.....Check after MatGetOwnershipRange() Istart=',Istart,' Iend=',Iend > > > do II=Istart,Iend-1 > > > mone=II+1 !(Coef,snr,rnr are 1-based row and column numbers, shifting them to 0-based) > > > write(*,'(1a,4i7)')'II=',II,mone,snr(mone),rnr(mone) > > > call MatSetValues(A_Petsc,ione,snr(mone),ione,rnr(mone),Coef(mone),INSERT_VALUES,ierr) > > > enddo > > > > > > write(*,'(1a)')'9.....Check after MatSetValues()' > > > call MatAssemblyBegin(A_petsc,MAT_FINAL_ASSEMBLY,ierr) > > > call MatAssemblyEnd(A_Petsc,MAT_FINAL_ASSEMBLY,ierr) > > > write(*,'(1a)')'10.....Check after MatCreate()' > > > call MatView(A_Petsc,PETSC_VIEWER_STDOUT_WORLD,ierr) > > > ! call KSPDestroy(ksp,ierr) > > > ! call VecDestroy(u,ierr) > > > ! call VecDestroy(x,ierr) > > > ! call VecDestroy(b,ierr) > > > call MatDestroy(A_petsc,ierr) > > > call PetscFinalize(ierr) > > > end program Debug_PETSc_MatCreate_20101130 > > > > > > > > > > From: bsmith at mcs.anl.gov > > > > Date: Wed, 1 Dec 2010 08:06:19 -0600 > > > > To: petsc-users at mcs.anl.gov > > > > Subject: Re: [petsc-users] column index in MatSetValues() > > > > > > > > > > > > > > > > --with-64-bit-indices=1 > > > > > > > > You only need this option if you are solving problems with over 2 billion unknowns! I recommend removing it otherwise, it wastes memory and slows performance slightly. > > > > > > > > > MatSetValues(A_Petsc,1,snr(Ione),1,rnr(Ione),Coef(Ione) > > > > ^^^^ ^^^^^ > > > > > > > > --with-64-bit-indices means ALL integers passed to PETSc MUST be 64 bit, but here you are passing the integer 1 as a "regular" 32 bit integer. You need to declare it as a PetscInt, for example > > > > > > > > PetscInt mone > > > > mone = 1 > > > > > MatSetValues(A_Petsc,mone,snr(Ione),mone,rnr(Ione),Coef(Ione) > > > > > > > > but better just build PETSc without the --with-64-bit-indices > > > > > > > > Barry > > > > > > > > > > > From vijay.m at gmail.com Thu Dec 2 14:45:25 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 2 Dec 2010 14:45:25 -0600 Subject: [petsc-users] Shell matrix and MatMultAdd. Message-ID: Hi all, There is probably some minor inconsistency in my understanding but is there a fundamental difference between the following two options ? // Option 1 ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr); ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr); // Option 2 ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr); Here A is a shell matrix (serial) that has a routine defined to perform MATOP_MULT only. I ask because Option 1 gives me the right result while Option 2 segfaults at the MatMultAdd line. My only logical conclusion is that I need to define a MATOP_MULT_ADD or something similar for the shell for this to work. Is this understanding correct ? I implicitly assumed that petsc recognizes that MATOP_MULT has been defined already and since MatMultAdd only requires the action of a matrix on a vector to perform its operation, should this not be computed by petsc automatically ? I really do not want to creat an extra vector here with Option 1 since this occurs at a finer level in my calculation. But is there any way that you would suggest I do this without extra allocations ? Any comments or pointers will be much appreciated. Vijay From bsmith at mcs.anl.gov Thu Dec 2 14:51:39 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 2 Dec 2010 14:51:39 -0600 Subject: [petsc-users] Shell matrix and MatMultAdd. In-Reply-To: References: Message-ID: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov> On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote: > Hi all, > > There is probably some minor inconsistency in my understanding but is > there a fundamental difference between the following two options ? > > // Option 1 > ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr); > ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr); > // Option 2 > ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr); > > Here A is a shell matrix (serial) that has a routine defined to > perform MATOP_MULT only. > > I ask because Option 1 gives me the right result while Option 2 > segfaults at the MatMultAdd line. My only logical conclusion is that I > need to define a MATOP_MULT_ADD or something similar for the shell for > this to work. Is this understanding correct ? Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix. > I implicitly assumed > that petsc recognizes that MATOP_MULT has been defined already and > since MatMultAdd only requires the action of a matrix on a vector to > perform its operation, should this not be computed by petsc > automatically ? Sorry it doesn't though of course it could. > I really do not want to creat an extra vector here > with Option 1 since this occurs at a finer level in my calculation. > But is there any way that you would suggest I do this without extra > allocations ? Any comments or pointers will be much appreciated. I suggest making a shell MatMultAdd_Mine() that does everything then also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll have one code to routine but can handle both operations. Barry > > Vijay From vijay.m at gmail.com Thu Dec 2 15:02:12 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 2 Dec 2010 15:02:12 -0600 Subject: [petsc-users] Shell matrix and MatMultAdd. In-Reply-To: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov> References: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov> Message-ID: > ? I suggest making a shell MatMultAdd_Mine() that does everything then > also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll > have one code to routine but can handle both operations. Barry, thanks for the suggestion. I will implement the MatMultAdd in this way. Puristically speaking though, this does reverse the dependency between the two routines ! Vijay On Thu, Dec 2, 2010 at 2:51 PM, Barry Smith wrote: > > On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote: > >> Hi all, >> >> There is probably some minor inconsistency in my understanding but is >> there a fundamental difference between the following two options ? >> >> // Option 1 >> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr); >> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr); >> // Option 2 >> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr); >> >> Here A is a shell matrix (serial) that has a routine defined to >> perform MATOP_MULT only. >> >> I ask because Option 1 gives me the right result while Option 2 >> segfaults at the MatMultAdd line. My only logical conclusion is that I >> need to define a MATOP_MULT_ADD or something similar for the shell for >> this to work. Is this understanding correct ? > > ? Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix. > >> I implicitly assumed >> that petsc recognizes that MATOP_MULT has been defined already and >> since MatMultAdd only requires the action of a matrix on a vector to >> perform its operation, should this not be computed by petsc >> automatically ? > > ? Sorry it doesn't though of course it could. > >> I really do not want to creat an extra vector here >> with Option 1 since this occurs at a finer level in my calculation. >> But is there any way that you would suggest I do this without extra >> allocations ? Any comments or pointers will be much appreciated. > > ? I suggest making a shell MatMultAdd_Mine() that does everything then > also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll > have one code to routine but can handle both operations. > > ? Barry > > > >> >> Vijay > > From bsmith at mcs.anl.gov Thu Dec 2 18:37:56 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 2 Dec 2010 18:37:56 -0600 Subject: [petsc-users] Shell matrix and MatMultAdd. In-Reply-To: References: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov> Message-ID: <03C42D84-A752-41D3-B60F-8FC4F30ADBD4@mcs.anl.gov> On Dec 2, 2010, at 3:02 PM, Vijay S. Mahadevan wrote: >> I suggest making a shell MatMultAdd_Mine() that does everything then >> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll >> have one code to routine but can handle both operations. > > Barry, thanks for the suggestion. I will implement the MatMultAdd in > this way. Puristically speaking though, this does reverse the > dependency between the two routines ! Well one could argue that MatMult() is a special case of MatMultAdd() with a zero vector input. Barry > > Vijay > > On Thu, Dec 2, 2010 at 2:51 PM, Barry Smith wrote: >> >> On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote: >> >>> Hi all, >>> >>> There is probably some minor inconsistency in my understanding but is >>> there a fundamental difference between the following two options ? >>> >>> // Option 1 >>> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr); >>> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr); >>> // Option 2 >>> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr); >>> >>> Here A is a shell matrix (serial) that has a routine defined to >>> perform MATOP_MULT only. >>> >>> I ask because Option 1 gives me the right result while Option 2 >>> segfaults at the MatMultAdd line. My only logical conclusion is that I >>> need to define a MATOP_MULT_ADD or something similar for the shell for >>> this to work. Is this understanding correct ? >> >> Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix. >> >>> I implicitly assumed >>> that petsc recognizes that MATOP_MULT has been defined already and >>> since MatMultAdd only requires the action of a matrix on a vector to >>> perform its operation, should this not be computed by petsc >>> automatically ? >> >> Sorry it doesn't though of course it could. >> >>> I really do not want to creat an extra vector here >>> with Option 1 since this occurs at a finer level in my calculation. >>> But is there any way that you would suggest I do this without extra >>> allocations ? Any comments or pointers will be much appreciated. >> >> I suggest making a shell MatMultAdd_Mine() that does everything then >> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll >> have one code to routine but can handle both operations. >> >> Barry >> >> >> >>> >>> Vijay >> >> From vijay.m at gmail.com Thu Dec 2 22:46:09 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 2 Dec 2010 22:46:09 -0600 Subject: [petsc-users] Shell matrix and MatMultAdd. In-Reply-To: <03C42D84-A752-41D3-B60F-8FC4F30ADBD4@mcs.anl.gov> References: <435AB5ED-CEE2-4170-98FD-2CF1AA89D427@mcs.anl.gov> <03C42D84-A752-41D3-B60F-8FC4F30ADBD4@mcs.anl.gov> Message-ID: > Well one could argue that MatMult() is a special case of MatMultAdd() with a zero vector input. This is true and can be done but just that I truly believe in the philosophy of building higher functions based on atomistic actions. Here, MatMultAdd is just a composite of MatMult and a VecAXPY, two basic operations. From a design stand-point, it is just confusing for me to look at it as a top-down scenario. Just my 2 cents. Again, from my implementation stand-point, I am going to proceed as you suggested because I just want to get my code working for now... Vijay On Thu, Dec 2, 2010 at 6:37 PM, Barry Smith wrote: > > On Dec 2, 2010, at 3:02 PM, Vijay S. Mahadevan wrote: > >>> ? I suggest making a shell MatMultAdd_Mine() that does everything then >>> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll >>> have one code to routine but can handle both operations. >> >> Barry, thanks for the suggestion. I will implement the MatMultAdd in >> this way. Puristically speaking though, this does reverse the >> dependency between the two routines ! > > ? Well one could argue that MatMult() is a special case of MatMultAdd() with a zero vector input. > > ?Barry > >> >> Vijay >> >> On Thu, Dec 2, 2010 at 2:51 PM, Barry Smith wrote: >>> >>> On Dec 2, 2010, at 2:45 PM, Vijay S. Mahadevan wrote: >>> >>>> Hi all, >>>> >>>> There is probably some minor inconsistency in my understanding but is >>>> there a fundamental difference between the following two options ? >>>> >>>> // Option 1 >>>> ierr = MatMult(A, solution, temporaryvec) ;CHKERRQ(ierr); >>>> ierr = VecAXPY(rhs, 1.0, temporaryvec) ;CHKERRQ(ierr); >>>> // Option 2 >>>> ierr = MatMultAdd(A, solution, rhs, rhs) ;CHKERRQ(ierr); >>>> >>>> Here A is a shell matrix (serial) that has a routine defined to >>>> perform MATOP_MULT only. >>>> >>>> I ask because Option 1 gives me the right result while Option 2 >>>> segfaults at the MatMultAdd line. My only logical conclusion is that I >>>> need to define a MATOP_MULT_ADD or something similar for the shell for >>>> this to work. Is this understanding correct ? >>> >>> ? Yes, if your code will use a MatMultAdd() then you need to provide that to the shell matrix. >>> >>>> I implicitly assumed >>>> that petsc recognizes that MATOP_MULT has been defined already and >>>> since MatMultAdd only requires the action of a matrix on a vector to >>>> perform its operation, should this not be computed by petsc >>>> automatically ? >>> >>> ? Sorry it doesn't though of course it could. >>> >>>> I really do not want to creat an extra vector here >>>> with Option 1 since this occurs at a finer level in my calculation. >>>> But is there any way that you would suggest I do this without extra >>>> allocations ? Any comments or pointers will be much appreciated. >>> >>> ? I suggest making a shell MatMultAdd_Mine() that does everything then >>> also code a MatMult_Mine() that zeros the output vector and then calls directly MatMultAdd_Mine(). This way you'll >>> have one code to routine but can handle both operations. >>> >>> ? Barry >>> >>> >>> >>>> >>>> Vijay >>> >>> > > From vijay.m at gmail.com Thu Dec 2 23:02:44 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 2 Dec 2010 23:02:44 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? Message-ID: Hi all, I was wondering whether the MG preconditioner object is generic enough to work out of the box like say ILU or SOR. To elaborate on this, if I can provide the number of levels, restriction and prolongation operators for each level and the system operators along with vectors allocated for solution and rhs, would it work as a preconditioner for my given problem and a prescribed rhs at the finest level of PCMG. Or does it need some knowledge of the fine and coarser meshes to perform the MG operations correctly ? All the examples I've seen using MG in petsc involve the DA and DMMG objects and since I use my own mesh and corresponding discretization code for an elliptic system, I'm curious about this usage. It would not be terribly difficult to write my own framework to do a simple V-cycle with my existing framework but since petsc already provides this functionality along with different types of MG solves (with verified code!), I really want to use it for my system. Any help and/or pointers are welcome. Thanks, vijay From jed at 59A2.org Fri Dec 3 03:43:22 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 10:43:22 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 06:02, Vijay S. Mahadevan wrote: > I was wondering whether the MG preconditioner object is generic enough > to work out of the box like say ILU or SOR. To elaborate on this, if > I can provide the number of levels, restriction and prolongation > operators for each level and the system operators along with vectors > allocated for solution and rhs, would it work as a preconditioner for > my given problem and a prescribed rhs at the finest level of PCMG. Or > does it need some knowledge of the fine and coarser meshes to perform > the MG operations correctly ? > PCMG is purely algebraic so it does not need knowledge about the mesh, just interpolation/restriction. However, if you want to use non-Galerkin coarse operators, then you will need a coarse mesh (this is something that DMMG facilitates). Look at section 4.4.7 of the users manual for explanation of using PCMG. There don't seem to be any well-documented examples using PCMG directly, but you might want to look at src/ksp/examples/tests/ex19.c or src/snes/examples/tests/ex11.c. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri Dec 3 03:44:49 2010 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 3 Dec 2010 10:44:49 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: Hey Vijay, PCMG is generic. If you provide the operators for each level, along with the restriction and prolongation, you can use PCMG. It doesn't need to know about the mesh. You don't actually need to provide the coarse grid operators. Given the fine grid operator and R and optionally P, you can use Galerkin coarsening by calling PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin Also, if you don't specify the prolongation, petsc will use P = R^T. Cheers, Dave On 3 December 2010 06:02, Vijay S. Mahadevan wrote: > Hi all, > > I was wondering whether the MG preconditioner object is generic enough > to work out of the box like say ILU or SOR. ?To elaborate on this, if > I can provide the number of levels, restriction and prolongation > operators for each level and the system operators along with vectors > allocated for solution and rhs, would it work as a preconditioner for > my given problem and a prescribed rhs at the finest level of PCMG. Or > does it need some knowledge of the fine and coarser meshes to perform > the MG operations correctly ? > > All the examples I've seen using MG in petsc involve the DA and DMMG > objects and since I use my own mesh and corresponding discretization > code for an elliptic system, I'm curious about this usage. It would > not be terribly difficult to write my own framework to do a simple > V-cycle with my existing framework but since petsc already provides > this functionality along with different types of MG solves (with > verified code!), I really want to use it for my system. Any help > and/or pointers are welcome. > > Thanks, > vijay > From knepley at gmail.com Fri Dec 3 11:02:45 2010 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Dec 2010 11:02:45 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: I will also note that a good intro for implementing your own might be the ML PC in Petsc. It puts the ML AMG package into the PCMG framework. Matt On Fri, Dec 3, 2010 at 3:44 AM, Dave May wrote: > Hey Vijay, > PCMG is generic. If you provide the operators for each level, along > with the restriction and prolongation, > you can use PCMG. It doesn't need to know about the mesh. > > You don't actually need to provide the coarse grid operators. > Given the fine grid operator and R and optionally P, you can use > Galerkin coarsening by calling > PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin > Also, if you don't specify the prolongation, petsc will use P = R^T. > > > Cheers, > Dave > > > On 3 December 2010 06:02, Vijay S. Mahadevan wrote: > > Hi all, > > > > I was wondering whether the MG preconditioner object is generic enough > > to work out of the box like say ILU or SOR. To elaborate on this, if > > I can provide the number of levels, restriction and prolongation > > operators for each level and the system operators along with vectors > > allocated for solution and rhs, would it work as a preconditioner for > > my given problem and a prescribed rhs at the finest level of PCMG. Or > > does it need some knowledge of the fine and coarser meshes to perform > > the MG operations correctly ? > > > > All the examples I've seen using MG in petsc involve the DA and DMMG > > objects and since I use my own mesh and corresponding discretization > > code for an elliptic system, I'm curious about this usage. It would > > not be terribly difficult to write my own framework to do a simple > > V-cycle with my existing framework but since petsc already provides > > this functionality along with different types of MG solves (with > > verified code!), I really want to use it for my system. Any help > > and/or pointers are welcome. > > > > Thanks, > > vijay > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Fri Dec 3 12:29:40 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Fri, 3 Dec 2010 12:29:40 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: Jed and Dave, thanks for the explanation. Now I do understand that the MG PC is generic if I have restriction/prolongation operators for every level. But I do not have the fine grid operator on hand explicitly (only a shell matrix with MatMult) and technically all my coarser grid operators will also be matrix-free. I was originally planning to hand these shell matrices to petsc as the coarse operators but can petsc do this by itself with only access to the fine grid operator ? Basically my problem is that I cannot afford to create a matrix until the coarsest level and I plan to use HYPRE BoomerAMG on the coarsest level. With this hierarchy, would MatMult enabled shell matrices provide enough leeway to get the optimal performance from MG ? Matt, I will look into the ML framework also. Thanks. Vijay On Fri, Dec 3, 2010 at 11:02 AM, Matthew Knepley wrote: > I will also note that a good intro for implementing your own might be the ML > PC > in Petsc. It puts the ML AMG package into the PCMG framework. > ?? Matt > > On Fri, Dec 3, 2010 at 3:44 AM, Dave May wrote: >> >> Hey Vijay, >> ?PCMG is generic. If you provide the operators for each level, along >> with the restriction and prolongation, >> you can use PCMG. It doesn't need to know about the mesh. >> >> You don't actually need to provide the coarse grid operators. >> Given the fine grid operator and R and optionally P, you can use >> Galerkin coarsening by calling >> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin >> Also, if you don't specify the prolongation, petsc will use P = R^T. >> >> >> Cheers, >> ?Dave >> >> >> On 3 December 2010 06:02, Vijay S. Mahadevan wrote: >> > Hi all, >> > >> > I was wondering whether the MG preconditioner object is generic enough >> > to work out of the box like say ILU or SOR. ?To elaborate on this, if >> > I can provide the number of levels, restriction and prolongation >> > operators for each level and the system operators along with vectors >> > allocated for solution and rhs, would it work as a preconditioner for >> > my given problem and a prescribed rhs at the finest level of PCMG. Or >> > does it need some knowledge of the fine and coarser meshes to perform >> > the MG operations correctly ? >> > >> > All the examples I've seen using MG in petsc involve the DA and DMMG >> > objects and since I use my own mesh and corresponding discretization >> > code for an elliptic system, I'm curious about this usage. It would >> > not be terribly difficult to write my own framework to do a simple >> > V-cycle with my existing framework but since petsc already provides >> > this functionality along with different types of MG solves (with >> > verified code!), I really want to use it for my system. Any help >> > and/or pointers are welcome. >> > >> > Thanks, >> > vijay >> > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > From rlmackie862 at gmail.com Fri Dec 3 12:33:39 2010 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 3 Dec 2010 10:33:39 -0800 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> Are there any examples that show how to use the ML PC? Randy On Dec 3, 2010, at 9:02 AM, Matthew Knepley wrote: > I will also note that a good intro for implementing your own might be the ML PC > in Petsc. It puts the ML AMG package into the PCMG framework. > > Matt > > On Fri, Dec 3, 2010 at 3:44 AM, Dave May wrote: > Hey Vijay, > PCMG is generic. If you provide the operators for each level, along > with the restriction and prolongation, > you can use PCMG. It doesn't need to know about the mesh. > > You don't actually need to provide the coarse grid operators. > Given the fine grid operator and R and optionally P, you can use > Galerkin coarsening by calling > PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin > Also, if you don't specify the prolongation, petsc will use P = R^T. > > > Cheers, > Dave > > > On 3 December 2010 06:02, Vijay S. Mahadevan wrote: > > Hi all, > > > > I was wondering whether the MG preconditioner object is generic enough > > to work out of the box like say ILU or SOR. To elaborate on this, if > > I can provide the number of levels, restriction and prolongation > > operators for each level and the system operators along with vectors > > allocated for solution and rhs, would it work as a preconditioner for > > my given problem and a prescribed rhs at the finest level of PCMG. Or > > does it need some knowledge of the fine and coarser meshes to perform > > the MG operations correctly ? > > > > All the examples I've seen using MG in petsc involve the DA and DMMG > > objects and since I use my own mesh and corresponding discretization > > code for an elliptic system, I'm curious about this usage. It would > > not be terribly difficult to write my own framework to do a simple > > V-cycle with my existing framework but since petsc already provides > > this functionality along with different types of MG solves (with > > verified code!), I really want to use it for my system. Any help > > and/or pointers are welcome. > > > > Thanks, > > vijay > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 3 12:34:21 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 19:34:21 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 19:29, Vijay S. Mahadevan wrote: > Jed and Dave, thanks for the explanation. Now I do understand that the > MG PC is generic if I have restriction/prolongation operators for > every level. But I do not have the fine grid operator on hand > explicitly (only a shell matrix with MatMult) and technically all my > coarser grid operators will also be matrix-free. I was originally > planning to hand these shell matrices to petsc as the coarse > operators but can petsc do this by itself with only access to the fine > grid operator ? > PETSc cannot magically create intermediate-level shell operators. If you want to do everything matrix-free, then you have to provide everything that needs to be "smart": restriction/interpolation, smoothers, and residuals. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 3 12:36:32 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 19:36:32 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> References: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> Message-ID: On Fri, Dec 3, 2010 at 19:33, Randall Mackie wrote: > Are there any examples that show how to use the ML PC? Run any example that uses assembled matrices (almost all of them) with -pc_type ml. It exposes the full multigrid hierarchy so all the usual options for PCMG work. ML-specific options are only available through the options database (we don't have interface functions for all of them). Run with -pc_type ml -help | grep pc_ml_ to see them. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 3 12:36:54 2010 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Dec 2010 12:36:54 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> References: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> Message-ID: On Fri, Dec 3, 2010 at 12:33 PM, Randall Mackie wrote: > Are there any examples that show how to use the ML PC? > -pc_type ml It is Algebraic Multigrid. Matt > Randy > > > On Dec 3, 2010, at 9:02 AM, Matthew Knepley wrote: > > I will also note that a good intro for implementing your own might be the > ML PC > in Petsc. It puts the ML AMG package into the PCMG framework. > > Matt > > On Fri, Dec 3, 2010 at 3:44 AM, Dave May wrote: > >> Hey Vijay, >> PCMG is generic. If you provide the operators for each level, along >> with the restriction and prolongation, >> you can use PCMG. It doesn't need to know about the mesh. >> >> You don't actually need to provide the coarse grid operators. >> Given the fine grid operator and R and optionally P, you can use >> Galerkin coarsening by calling >> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin >> Also, if you don't specify the prolongation, petsc will use P = R^T. >> >> >> Cheers, >> Dave >> >> >> On 3 December 2010 06:02, Vijay S. Mahadevan wrote: >> > Hi all, >> > >> > I was wondering whether the MG preconditioner object is generic enough >> > to work out of the box like say ILU or SOR. To elaborate on this, if >> > I can provide the number of levels, restriction and prolongation >> > operators for each level and the system operators along with vectors >> > allocated for solution and rhs, would it work as a preconditioner for >> > my given problem and a prescribed rhs at the finest level of PCMG. Or >> > does it need some knowledge of the fine and coarser meshes to perform >> > the MG operations correctly ? >> > >> > All the examples I've seen using MG in petsc involve the DA and DMMG >> > objects and since I use my own mesh and corresponding discretization >> > code for an elliptic system, I'm curious about this usage. It would >> > not be terribly difficult to write my own framework to do a simple >> > V-cycle with my existing framework but since petsc already provides >> > this functionality along with different types of MG solves (with >> > verified code!), I really want to use it for my system. Any help >> > and/or pointers are welcome. >> > >> > Thanks, >> > vijay >> > >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Fri Dec 3 12:43:15 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Fri, 3 Dec 2010 12:43:15 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> Message-ID: Matt, I have used the Hypre AMG option in the past but have not tried ML AMG before. Are there any added advantages in terms of performance/memory footprint and such between the two ? Vijay On Fri, Dec 3, 2010 at 12:36 PM, Matthew Knepley wrote: > On Fri, Dec 3, 2010 at 12:33 PM, Randall Mackie > wrote: >> >> Are there any examples that show how to use the ML PC? > > -pc_type ml > It is Algebraic Multigrid. > ?? Matt > >> >> Randy >> >> On Dec 3, 2010, at 9:02 AM, Matthew Knepley wrote: >> >> I will also note that a good intro for implementing your own might be the >> ML PC >> in Petsc. It puts the ML AMG package into the PCMG framework. >> ?? Matt >> >> On Fri, Dec 3, 2010 at 3:44 AM, Dave May wrote: >>> >>> Hey Vijay, >>> ?PCMG is generic. If you provide the operators for each level, along >>> with the restriction and prolongation, >>> you can use PCMG. It doesn't need to know about the mesh. >>> >>> You don't actually need to provide the coarse grid operators. >>> Given the fine grid operator and R and optionally P, you can use >>> Galerkin coarsening by calling >>> PCMGSetGalerkin() or via the command line arg -pc_mg_galerkin >>> Also, if you don't specify the prolongation, petsc will use P = R^T. >>> >>> >>> Cheers, >>> ?Dave >>> >>> >>> On 3 December 2010 06:02, Vijay S. Mahadevan wrote: >>> > Hi all, >>> > >>> > I was wondering whether the MG preconditioner object is generic enough >>> > to work out of the box like say ILU or SOR. ?To elaborate on this, if >>> > I can provide the number of levels, restriction and prolongation >>> > operators for each level and the system operators along with vectors >>> > allocated for solution and rhs, would it work as a preconditioner for >>> > my given problem and a prescribed rhs at the finest level of PCMG. Or >>> > does it need some knowledge of the fine and coarser meshes to perform >>> > the MG operations correctly ? >>> > >>> > All the examples I've seen using MG in petsc involve the DA and DMMG >>> > objects and since I use my own mesh and corresponding discretization >>> > code for an elliptic system, I'm curious about this usage. It would >>> > not be terribly difficult to write my own framework to do a simple >>> > V-cycle with my existing framework but since petsc already provides >>> > this functionality along with different types of MG solves (with >>> > verified code!), I really want to use it for my system. Any help >>> > and/or pointers are welcome. >>> > >>> > Thanks, >>> > vijay >>> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > From jed at 59A2.org Fri Dec 3 12:47:57 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 19:47:57 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: <9CED80B4-E3D7-48C3-839C-6D754ECCD455@gmail.com> Message-ID: On Fri, Dec 3, 2010 at 19:43, Vijay S. Mahadevan wrote: > Are there any added advantages in terms of > performance/memory footprint and such between the two ? > Generally, ML takes less memory and it returns coarse level operators to PETSc so you have a lot more flexibility (you can use all of PETSc's preconditioners as smoothers, you can control each level independently, and you have lots of options to solve the coarse-level problem). ML needs fewer levels and has lower setup costs. BoomerAMG usually produces a more robust hierarchy so it works for some problems that ML does not. It is basically a black box so you only have the flexibility that they specifically provided (rather than everything in PETSc plus whatever you might want to do). Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Fri Dec 3 13:20:54 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Fri, 3 Dec 2010 13:20:54 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: > PETSc cannot magically create intermediate-level shell operators. If you > want to do everything matrix-free, then you have to provide everything that > needs to be "smart": restriction/interpolation, smoothers, and residuals. This is fine and I am more than happy to hand these to Petsc. I do have the restriction/prolongation matrices explicitly but only the operators are matrix-free for now. So if I start off with a matrix-free PC operator, how exactly do I provide the shell matrix operators for all the other levels ? I do not see any routine to enable this and my conclusion is that petsc does find it using the fine grid operator at hand using R^T*A*R transformation. If this is the case for all levels, then it avalanches into a lot of fine-grid matrix vector products. I hope this is not the way it is done. My other line of thinking is to directly manipulate the KSP/PC operators at every level and replace them with the correct shell matrices. But I am not sure what is the recommended procedure here. All comments/suggestions welcome. Vijay On Fri, Dec 3, 2010 at 12:34 PM, Jed Brown wrote: > On Fri, Dec 3, 2010 at 19:29, Vijay S. Mahadevan wrote: >> >> Jed and Dave, thanks for the explanation. Now I do understand that the >> MG PC is generic if I have restriction/prolongation operators for >> every level. But I do not have the fine grid operator on hand >> explicitly (only a shell matrix with MatMult) and technically all my >> coarser grid operators will also be matrix-free. I was originally >> planning to hand these shell matrices to petsc as the coarse >> operators but can petsc do this by itself with only access to the fine >> grid operator ? > > PETSc cannot magically create intermediate-level shell operators. ?If you > want to do everything matrix-free, then you have to provide everything that > needs to be "smart": restriction/interpolation, smoothers, and residuals. > Jed From jed at 59A2.org Fri Dec 3 13:27:01 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 20:27:01 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 20:20, Vijay S. Mahadevan wrote: > This is fine and I am more than happy to hand these to Petsc. I do > have the restriction/prolongation matrices explicitly but only the > operators are matrix-free for now. So if I start off with a > matrix-free PC operator, how exactly do I provide the shell matrix > operators for all the other levels ? > PCMGSetResidual, PCMGGetSmoother and set it to whatever type you want (maybe your custom one). > I do not see any routine to enable this and my conclusion is that > petsc does find it using the fine grid operator at hand using R^T*A*R > transformation. > Galerkin coarse operators really need to be formed (not applied as a product) to make sense. PCMG calls MatPtAP(), Galerkin coarse operators will not work unless this is implemented. You want to provide the matrix yourself. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Fri Dec 3 13:53:32 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Fri, 3 Dec 2010 13:53:32 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: > Galerkin coarse operators really need to be formed (not applied as a > product) to make sense. PCMG calls MatPtAP(), Galerkin coarse operators > will not work unless this is implemented. You want to provide the matrix > yourself. So this does require the fine grid matrix to be formed explicitly and handed over to the top level KSP solver ? Since I start with a shell matrix and can only provide action of the fine operator on a vector, this does pose a problem. I will implement based on the shell matrix for now and see at which point the requirements are stopping me. Thanks for all the help Jed. I will post here again if I have more questions. Vijay On Fri, Dec 3, 2010 at 1:27 PM, Jed Brown wrote: > On Fri, Dec 3, 2010 at 20:20, Vijay S. Mahadevan wrote: >> >> This is fine and I am more than happy to hand these to Petsc. I do >> have the restriction/prolongation matrices explicitly but only the >> operators are matrix-free for now. So if I start off with a >> matrix-free PC operator, how exactly do I provide the shell matrix >> operators for all the other levels ? > > PCMGSetResidual, PCMGGetSmoother and set it to whatever type you want (maybe > your custom one). > >> >> I do not see any routine to enable this and my conclusion is that >> petsc does find it using the fine grid operator at hand using R^T*A*R >> transformation. > > Galerkin coarse operators really need to be formed (not applied as a > product) to make sense. ?PCMG calls MatPtAP(), Galerkin coarse operators > will not work unless this is implemented. ?You want to provide the matrix > yourself. > Jed From jed at 59A2.org Fri Dec 3 13:56:20 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 20:56:20 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 20:53, Vijay S. Mahadevan wrote: > So this does require the fine grid matrix to be formed explicitly and > handed over to the top level KSP solver ? > If you want to use Galerkin coarse operators, then you have to assemble the fine-grid matrix. This is an algorithmic issue, not a matter of PETSc's API or something like that. If you can provide matrix-free residuals/smoothers, then you don't need to use the Galerkin procedure to build coarse operators. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Fri Dec 3 14:09:01 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Fri, 3 Dec 2010 14:09:01 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: > If you want to use Galerkin coarse operators, then you have to assemble the > fine-grid matrix. This is an algorithmic issue, not a matter of PETSc's API > or something like that. If you can provide matrix-free residuals/smoothers, > then you don't need to use the Galerkin procedure to build coarse operators. Ah, I misunderstood your explanation earlier. If I do provide the restriction/prolongation along with a fine-grid shell matrix and opt to not use Galerkin MG, then how do I provide the coarse grid operators to petsc ? I also just remembered from one of your earlier posts that you mentioned the use of non-Galerkin coarse operators requires a coarse mesh to be provided. Since my code does not use DMMG at all but is rather based on an unstructured grid setting using libMesh, I do not know how to proceed here. And I dont quite get what a matrix-free residual is.. Wouldn?t PCMGDefaultResidual compute the residual with just MatMult operation defined (b-Ax) for every level ? Why do I need a custom residual operator ? On Fri, Dec 3, 2010 at 1:56 PM, Jed Brown wrote: > On Fri, Dec 3, 2010 at 20:53, Vijay S. Mahadevan wrote: >> >> So this does require the fine grid matrix to be formed explicitly and >> handed over to the top level KSP solver ? > > If you want to use Galerkin coarse operators, then you have to assemble the > fine-grid matrix. ?This is an algorithmic issue, not a matter of PETSc's API > or something like that. ?If you can provide matrix-free residuals/smoothers, > then you don't need to use the Galerkin procedure to build coarse operators. > Jed From jed at 59A2.org Fri Dec 3 14:16:01 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 21:16:01 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan wrote: > Ah, I misunderstood your explanation earlier. If I do provide the > restriction/prolongation along with a fine-grid shell matrix and opt > to not use Galerkin MG, then how do I provide the coarse grid > operators to petsc? > PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators(). > I also just remembered from one of your earlier > posts that you mentioned the use of non-Galerkin coarse operators > requires a coarse mesh to be provided. > No, this is not required. PCMG's interface is purely algebraic, you do not need to use DMMG or otherwise provide a "mesh". You have to provide coarse-level operators (as described above). This is all in the users manual. > And I dont quite get what a matrix-free residual is.. Wouldn?t > PCMGDefaultResidual compute the residual with just MatMult operation > defined (b-Ax) for every level ? Why do I need a custom residual > operator ? > If you have wrapped your coarse-level operator in MatShell, then you can just pass that in and use PCMGDefaultResidual. Also from the users manual: *The residual() function can be set to be PCMGDefaultResidual() if one?s operator is stored in a Mat format.* *In certain circumstances, where it is much cheaper to calculate the residual directly, rather than through the* *usual formula b ? Ax, the user may wish to provide an alternative.* Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylbar.vainbot at gmail.com Fri Dec 3 14:25:58 2010 From: sylbar.vainbot at gmail.com (Sylvain Barbot) Date: Fri, 3 Dec 2010 12:25:58 -0800 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: Hi all, Very interesting discussion. It would be great to have available a simple multi-grid example using matrix-free methods. I think it would greatly clarify the existing documentation about the multi-grid tools in Petsc. I suggest showing a simple 1-D linear ODE solved with multigrid, so that one can focus on the architecture. Existing documentation is clear only retrospectively to most of us. Best wishes, Sylvain 2010/12/3 Jed Brown : > On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan wrote: >> >> Ah, I misunderstood your explanation earlier. If I do provide the >> restriction/prolongation along with a fine-grid shell matrix and opt >> to not use Galerkin MG, then how do I provide the coarse grid >> operators to petsc? > > PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators(). > >> >> I also just remembered from one of your earlier >> posts that you mentioned the use of non-Galerkin coarse operators >> requires a coarse mesh to be provided. > > No, this is not required. PCMG's interface is purely algebraic, you do not > need to use DMMG or otherwise provide a "mesh". You have to provide > coarse-level operators (as described above). This is all in the users > manual. > >> >> And I dont quite get what a matrix-free residual is.. Wouldn?t >> PCMGDefaultResidual compute the residual with just MatMult operation >> defined (b-Ax) for every level ? Why do I need a custom residual >> operator ? > > If you have wrapped your coarse-level operator in MatShell, then you can > just pass that in and use PCMGDefaultResidual. Also from the users manual: > The residual() function can be set to be PCMGDefaultResidual() if one's > operator is stored in a Mat format. > In certain circumstances, where it is much cheaper to calculate the residual > directly, rather than through the > usual formula b - Ax, the user may wish to provide an alternative. > Jed From vijay.m at gmail.com Fri Dec 3 14:37:35 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Fri, 3 Dec 2010 14:37:35 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: The custom residual makes sense now. I can find the residual itself in the same way my matrix application works and this should result in some savings.. Jed, thanks a ton for these detailed explanations. I think I understand enough to get going with this. If I hit a roadblock, I will post a question here. Thanks and have a great day. Vijay On Fri, Dec 3, 2010 at 2:16 PM, Jed Brown wrote: > On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan wrote: >> >> Ah, I misunderstood your explanation earlier. If I do provide the >> restriction/prolongation along with a fine-grid shell matrix and opt >> to not use Galerkin MG, then how do I provide the coarse grid >> operators to petsc? > > PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators(). > >> >> I also just remembered from one of your earlier >> posts that you mentioned the use of non-Galerkin coarse operators >> requires a coarse mesh to be provided. > > No, this is not required. PCMG's interface is purely algebraic, you do not > need to use DMMG or otherwise provide a "mesh". You have to provide > coarse-level operators (as described above). This is all in the users > manual. > >> >> And I dont quite get what a matrix-free residual is.. Wouldn?t >> PCMGDefaultResidual compute the residual with just MatMult operation >> defined (b-Ax) for every level ? Why do I need a custom residual >> operator ? > > If you have wrapped your coarse-level operator in MatShell, then you can > just pass that in and use PCMGDefaultResidual. Also from the users manual: > The residual() function can be set to be PCMGDefaultResidual() if one's > operator is stored in a Mat format. > In certain circumstances, where it is much cheaper to calculate the residual > directly, rather than through the > usual formula b - Ax, the user may wish to provide an alternative. > Jed From balay at mcs.anl.gov Fri Dec 3 15:18:18 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 3 Dec 2010 15:18:18 -0600 (CST) Subject: [petsc-users] potential e-mail disruption due to power outage Message-ID: We are having a power outage this weekend at MCS [staring friday evening, and ending sunday evening or monday morning CST] e-mail, mailing-lists, ftp, web servers are supposed to work during this outage. But if something isn't working - and we are unable to respond by e-mail, you now know the reason. Satish From xdliang at gmail.com Fri Dec 3 16:19:31 2010 From: xdliang at gmail.com (Xiangdong Liang) Date: Fri, 3 Dec 2010 17:19:31 -0500 Subject: [petsc-users] run direct linear solver in parallel Message-ID: Hi everyone, I am wondering how I can run the direct solver in parallel. I can run my program in a single processor with direct linear solver by ./foo.out -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles However, when I try to run it with mpi: mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles I got error like this: [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: No support for this operation for this object type! [0]PETSC ERROR: Matrix type mpiaij symbolic LU! [0]PETSC ERROR: Libraries linked from /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008 [0]PETSC ERROR: Configure options --with-shared --with-dynamic --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" --with-superlu=1 --with-superlu-include=/usr/include/superlu --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c ------------------------------------------------------- Would you like to tell me where I am doing wrong? I appreciate your help. Xiangdong From jed at 59A2.org Fri Dec 3 16:22:06 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 3 Dec 2010 23:22:06 +0100 Subject: [petsc-users] run direct linear solver in parallel In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 23:19, Xiangdong Liang wrote: > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt > 2.3.3 is very old, you should upgrade to petsc-3.1 Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Dec 3 16:22:24 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 3 Dec 2010 16:22:24 -0600 Subject: [petsc-users] run direct linear solver in parallel In-Reply-To: References: Message-ID: <65790980-1EFC-45A4-9304-B18597F049E9@mcs.anl.gov> You are using an ANCIENT version of PETSc but using documentation from a much newer version. You should upgrade to petsc-3.1 immediately then life will be easy :-) Barry On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote: > Hi everyone, > > I am wondering how I can run the direct solver in parallel. I can run > my program in a single processor with direct linear solver by > > ./foo.out -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles > > However, when I try to run it with mpi: > > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu > -pc_factor_mat_solver_package spooles > > I got error like this: > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: No support for this operation for this object type! > [0]PETSC ERROR: Matrix type mpiaij symbolic LU! > > [0]PETSC ERROR: Libraries linked from > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008 > [0]PETSC ERROR: Configure options --with-shared --with-dynamic > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" > --with-superlu=1 --with-superlu-include=/usr/include/superlu > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1 > --with-spooles-include=/usr/include/spooles > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c > [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c > ------------------------------------------------------- > > Would you like to tell me where I am doing wrong? I appreciate your help. > > Xiangdong From hzhang at mcs.anl.gov Fri Dec 3 19:47:48 2010 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Fri, 3 Dec 2010 19:47:48 -0600 Subject: [petsc-users] run direct linear solver in parallel In-Reply-To: <65790980-1EFC-45A4-9304-B18597F049E9@mcs.anl.gov> References: <65790980-1EFC-45A4-9304-B18597F049E9@mcs.anl.gov> Message-ID: Note: spooles has been out of support from its developers for more than 10 years. Recommend to use superlu_dist or mumps. Hong On Fri, Dec 3, 2010 at 4:22 PM, Barry Smith wrote: > > ?You are using an ANCIENT version of PETSc but using documentation from a much newer version. You should upgrade to petsc-3.1 immediately then life will be easy :-) > > > ? Barry > > On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote: > >> Hi everyone, >> >> I am wondering how I can run the direct solver in parallel. I can run >> my program in a single processor with direct linear solver by >> >> ./foo.out ?-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles >> >> However, when I try to run it with mpi: >> >> mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu >> -pc_factor_mat_solver_package spooles >> >> I got error like this: >> >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: No support for this operation for this object type! >> [0]PETSC ERROR: Matrix type mpiaij ?symbolic LU! >> >> [0]PETSC ERROR: Libraries linked from >> /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt >> [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008 >> [0]PETSC ERROR: Configure options --with-shared --with-dynamic >> --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi >> --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack >> --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse >> --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" >> --with-superlu=1 --with-superlu-include=/usr/include/superlu >> --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1 >> --with-spooles-include=/usr/include/spooles >> --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 >> --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c >> [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c >> ------------------------------------------------------- >> >> Would you like to tell me where I am doing wrong? I appreciate your help. >> >> Xiangdong > > From chetan.jhurani at gmail.com Mon Dec 6 21:47:36 2010 From: chetan.jhurani at gmail.com (Chetan Jhurani) Date: Mon, 6 Dec 2010 20:47:36 -0700 Subject: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - flipped output in KSPSolve. Message-ID: <4A1257D9430643B892E650250EBC86BD@spiff> Hello, I have a small code that solves a 3x3 system with and without SuperLU. If SuperLU is used, vectors x and b (in A x = b) come out flipped after KSPSolve. Am I doing something stupid? The two outputs, the code, and the configure command are pasted below. Thanks, Chetan -------------------------------------------------------------------------- Run with SuperLU: ~> lu_test -pc_factor_mat_solver_package superlu x before solve -10 -10 -10 b before solve 200 200 200 x after solve 200 200 200 b after solve 0.2 0.2 0.2 -------------------------------------------------------------------------- Run without SuperLU (default KSP, PC options) ~> lu_test x before solve -10 -10 -10 b before solve 200 200 200 x after solve 0.2 0.2 0.2 b after solve 200 200 200 -------------------------------------------------------------------------- Program: int main(int argc, char* argv[]) { PetscErrorCode ierr; ierr = PetscInitialize(&argc, &argv, PETSC_NULL, PETSC_NULL); CHKERRQ(ierr); Mat A; ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 1, PETSC_NULL, &A); CHKERRQ(ierr); ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); ierr = MatShift(A, 1000); CHKERRQ(ierr); Vec x, b; ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &x); CHKERRQ(ierr); ierr = VecShift(x, -10); CHKERRQ(ierr); ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &b); CHKERRQ(ierr); ierr = VecShift(b, 200); CHKERRQ(ierr); KSP ksp; ierr = KSPCreate(PETSC_COMM_SELF, &ksp); CHKERRQ(ierr); ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); printf("x before solve\n"); ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); printf("b before solve\n"); ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN); CHKERRQ(ierr); ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); printf("x after solve\n"); ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); printf("b after solve\n"); ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); ierr = PetscFinalize(); CHKERRQ(ierr); } -------------------------------------------------------------------------- configure command: python ./config/configure.py -with-clanguage=C++ -with-debugging=no --with-gnu-compilers=1 --with-mpi=1 --with-umfpack=1 --with-superlu=1 --with-hypre=1 --download-umfpack=1 --download-superlu=1 --download-hypre=1 --with-mumps=1 --download-mumps=1 --with-parmetis=1 --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 --with-blacs=1 --download-blacs=1 -with-c-support=1 -------------------------------------------------------------------------- From hzhang at mcs.anl.gov Mon Dec 6 22:57:35 2010 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 6 Dec 2010 22:57:35 -0600 Subject: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - flipped output in KSPSolve. In-Reply-To: <4A1257D9430643B892E650250EBC86BD@spiff> References: <4A1257D9430643B892E650250EBC86BD@spiff> Message-ID: Chetan: Move ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); right before ierr = KSPSolve() and run your code with -pc_type lu -pc_factor_mat_solver_package superlu I got correct answer. Attached is my modified code. Hong > > I have a small code that solves a 3x3 system with and without > SuperLU. ?If SuperLU is used, vectors x and b (in A x = b) come > out flipped after KSPSolve. ?Am I doing something stupid? ?The two > outputs, the code, and the configure command are pasted below. > > Thanks, > > Chetan > > -------------------------------------------------------------------------- > > Run with SuperLU: > > ~> lu_test -pc_factor_mat_solver_package superlu > x before solve > -10 > -10 > -10 > b before solve > 200 > 200 > 200 > x after solve > 200 > 200 > 200 > b after solve > 0.2 > 0.2 > 0.2 > > -------------------------------------------------------------------------- > > Run without SuperLU (default KSP, PC options) > > ~> lu_test > x before solve > -10 > -10 > -10 > b before solve > 200 > 200 > 200 > x after solve > 0.2 > 0.2 > 0.2 > b after solve > 200 > 200 > 200 > > -------------------------------------------------------------------------- > > Program: > > int main(int argc, char* argv[]) > { > ? ?PetscErrorCode ierr; > > ? ?ierr = PetscInitialize(&argc, &argv, PETSC_NULL, PETSC_NULL); > CHKERRQ(ierr); > > ? ?Mat A; > ? ?ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 1, PETSC_NULL, &A); > CHKERRQ(ierr); > ? ?ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > ? ?ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > ? ?ierr = MatShift(A, 1000); CHKERRQ(ierr); > > ? ?Vec x, b; > ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &x); CHKERRQ(ierr); > ? ?ierr = VecShift(x, -10); CHKERRQ(ierr); > > ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &b); CHKERRQ(ierr); > ? ?ierr = VecShift(b, 200); CHKERRQ(ierr); > > ? ?KSP ksp; > ? ?ierr = KSPCreate(PETSC_COMM_SELF, &ksp); CHKERRQ(ierr); > ? ?ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); > > ? ?printf("x before solve\n"); > ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > ? ?printf("b before solve\n"); > ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > > ? ?ierr = KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN); CHKERRQ(ierr); > ? ?ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > ? ?printf("x after solve\n"); > ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > ? ?printf("b after solve\n"); > ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > > ? ?ierr = PetscFinalize(); CHKERRQ(ierr); > } > > -------------------------------------------------------------------------- > > configure command: > > python ./config/configure.py -with-clanguage=C++ -with-debugging=no > --with-gnu-compilers=1 --with-mpi=1 --with-umfpack=1 --with-superlu=1 > --with-hypre=1 --download-umfpack=1 --download-superlu=1 > --download-hypre=1 --with-mumps=1 --download-mumps=1 --with-parmetis=1 > --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 > --with-blacs=1 --download-blacs=1 -with-c-support=1 > > -------------------------------------------------------------------------- > > -------------- next part -------------- A non-text attachment was scrubbed... Name: chetan.c Type: text/x-csrc Size: 1494 bytes Desc: not available URL: From chetan.jhurani at gmail.com Mon Dec 6 23:25:52 2010 From: chetan.jhurani at gmail.com (Chetan Jhurani) Date: Mon, 6 Dec 2010 22:25:52 -0700 Subject: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - flipped output inKSPSolve. In-Reply-To: References: <4A1257D9430643B892E650250EBC86BD@spiff> Message-ID: <47EF0668992E46D683DA71BB94E8A485@spiff> Thanks Hong. It does work now. Chetan > -----Original Message----- > From: petsc-users-bounces at mcs.anl.gov > [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Hong Zhang > Sent: Monday, December 06, 2010 09:58 PM > To: PETSc users list > Subject: Re: [petsc-users] SuperLU_4.0 with petsc-3.1-p3 - > flipped output inKSPSolve. > > Chetan: > > Move > ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); > > right before > ierr = KSPSolve() > and run your code with > -pc_type lu -pc_factor_mat_solver_package superlu > > I got correct answer. > Attached is my modified code. > > Hong > > > > I have a small code that solves a 3x3 system with and without > > SuperLU. ?If SuperLU is used, vectors x and b (in A x = b) come > > out flipped after KSPSolve. ?Am I doing something stupid? ?The two > > outputs, the code, and the configure command are pasted below. > > > > Thanks, > > > > Chetan > > > > > -------------------------------------------------------------- > ------------ > > > > Run with SuperLU: > > > > ~> lu_test -pc_factor_mat_solver_package superlu > > x before solve > > -10 > > -10 > > -10 > > b before solve > > 200 > > 200 > > 200 > > x after solve > > 200 > > 200 > > 200 > > b after solve > > 0.2 > > 0.2 > > 0.2 > > > > > -------------------------------------------------------------- > ------------ > > > > Run without SuperLU (default KSP, PC options) > > > > ~> lu_test > > x before solve > > -10 > > -10 > > -10 > > b before solve > > 200 > > 200 > > 200 > > x after solve > > 0.2 > > 0.2 > > 0.2 > > b after solve > > 200 > > 200 > > 200 > > > > > -------------------------------------------------------------- > ------------ > > > > Program: > > > > int main(int argc, char* argv[]) > > { > > ? ?PetscErrorCode ierr; > > > > ? ?ierr = PetscInitialize(&argc, &argv, PETSC_NULL, PETSC_NULL); > > CHKERRQ(ierr); > > > > ? ?Mat A; > > ? ?ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 1, PETSC_NULL, &A); > > CHKERRQ(ierr); > > ? ?ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > > ? ?ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > > > ? ?ierr = MatShift(A, 1000); CHKERRQ(ierr); > > > > ? ?Vec x, b; > > ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &x); CHKERRQ(ierr); > > ? ?ierr = VecShift(x, -10); CHKERRQ(ierr); > > > > ? ?ierr = VecCreateSeq(PETSC_COMM_SELF, 3, &b); CHKERRQ(ierr); > > ? ?ierr = VecShift(b, 200); CHKERRQ(ierr); > > > > ? ?KSP ksp; > > ? ?ierr = KSPCreate(PETSC_COMM_SELF, &ksp); CHKERRQ(ierr); > > ? ?ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); > > > > ? ?printf("x before solve\n"); > > ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > > ? ?printf("b before solve\n"); > > ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > > > > ? ?ierr = KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN); > CHKERRQ(ierr); > > ? ?ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > > > ? ?printf("x after solve\n"); > > ? ?ierr = VecView(x, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > > ? ?printf("b after solve\n"); > > ? ?ierr = VecView(b, PETSC_VIEWER_STDOUT_SELF); CHKERRQ(ierr); > > > > ? ?ierr = PetscFinalize(); CHKERRQ(ierr); > > } > > > > > -------------------------------------------------------------- > ------------ > > > > configure command: > > > > python ./config/configure.py -with-clanguage=C++ -with-debugging=no > > --with-gnu-compilers=1 --with-mpi=1 --with-umfpack=1 > --with-superlu=1 > > --with-hypre=1 --download-umfpack=1 --download-superlu=1 > > --download-hypre=1 --with-mumps=1 --download-mumps=1 > --with-parmetis=1 > > --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 > > --with-blacs=1 --download-blacs=1 -with-c-support=1 > > > > > -------------------------------------------------------------- > ------------ > > > > > From vijay.m at gmail.com Tue Dec 7 00:42:52 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 7 Dec 2010 00:42:52 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: Jed, I stumbled onto an issue and it is probably my lack of complete understanding still. I'll try to be as clear as possible but if it is unclear, do let me know. When I use PCMG as a preconditioner to solve a fine grid system using a linear solver, how do I set the interpolation to the linear system being solved. i.e., my preconditioner starts at max_levels and hierarchy proceeds to 0 (coarsest) and my linear system is technically at max_levels+1. I have a vector length incompatibility since I cannot seem to set the prolongation to go from max_levels to max_levels+1, where the linear solver subspace resides. I am probably not setting the right projection matrices at the right levels but it is slightly confusing since the coarsest level does not take any projection matrices. Hence my question is, when PCMG proceeds with the richardson iteration at every linear iteration on finest grid problem (with say FGMRES), how does the action of the preconditioner return a vector of right length ? I thought of adding the original fine grid problem to the PCMG levels (now max_levels=max_levels+1) but this by philosophy uses PCMG itself as a solver. Does it not ? Or did I misunderstand and make a wrong assumption here ? I've been stuck on this for a while and looking at DMMG implementation didn't help with my problem either. Any help is much appreciated. Vijay On Fri, Dec 3, 2010 at 2:37 PM, Vijay S. Mahadevan wrote: > The custom residual makes sense now. I can find the residual itself in > the same way my matrix application works and this should result in > some savings.. > > Jed, thanks a ton for these detailed explanations. I think I > understand enough to get going with this. If I hit a roadblock, I will > post a question here. Thanks and have a great day. > > Vijay > > On Fri, Dec 3, 2010 at 2:16 PM, Jed Brown wrote: >> On Fri, Dec 3, 2010 at 21:09, Vijay S. Mahadevan wrote: >>> >>> Ah, I misunderstood your explanation earlier. If I do provide the >>> restriction/prolongation along with a fine-grid shell matrix and opt >>> to not use Galerkin MG, then how do I provide the coarse grid >>> operators to petsc? >> >> PCMGSetResidual() and PCMGGetSmoother() followed by KSPSetOperators(). >> >>> >>> I also just remembered from one of your earlier >>> posts that you mentioned the use of non-Galerkin coarse operators >>> requires a coarse mesh to be provided. >> >> No, this is not required. PCMG's interface is purely algebraic, you do not >> need to use DMMG or otherwise provide a "mesh". ?You have to provide >> coarse-level operators (as described above). ?This is all in the users >> manual. >> >>> >>> And I dont quite get what a matrix-free residual is.. Wouldn?t >>> PCMGDefaultResidual compute the residual with just MatMult operation >>> defined (b-Ax) for every level ? Why do I need a custom residual >>> operator ? >> >> If you have wrapped your coarse-level operator in MatShell, then you can >> just pass that in and use PCMGDefaultResidual. ?Also from the users manual: >> The residual() function can be set to be PCMGDefaultResidual() if one's >> operator is stored in a Mat format. >> In certain circumstances, where it is much cheaper to calculate the residual >> directly, rather than through the >> usual formula b - Ax, the user may wish to provide an alternative. >> Jed > From jed at 59A2.org Tue Dec 7 04:07:30 2010 From: jed at 59A2.org (Jed Brown) Date: Tue, 7 Dec 2010 11:07:30 +0100 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: On Tue, Dec 7, 2010 at 07:42, Vijay S. Mahadevan wrote: > When I use PCMG as a preconditioner to solve a fine grid system using > a linear solver, how do I set the interpolation to the linear system > being solved. i.e., my preconditioner starts at max_levels and > hierarchy proceeds to 0 (coarsest) and my linear system is technically > at max_levels+1. > Set nlevels to whatever you want with PCMGSetLevels, then level=nlevels-1 is the fine-level problem, you can set the pre-smoother on that level to PCNONE if you want the "down" part of your cycle to skip it. > I have a vector length incompatibility since I cannot > seem to set the prolongation to go from max_levels to max_levels+1, > where the linear solver subspace resides. > The finest level in PCMG should be the space that your KSP works in. The interpolation operator on that level maps from the next coarsest level, i.e. MatInterpolate(level[n].restrict,level[n-1].x,level[n].x); > I thought of adding the original fine grid problem to the PCMG levels > (now max_levels=max_levels+1) but this by philosophy uses PCMG itself > as a solver. Does it not ? > No, it's still a preconditioner. "Multigrid as a solver" just means "Richardson preconditioned by multigrid". Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From neckel at in.tum.de Tue Dec 7 09:17:18 2010 From: neckel at in.tum.de (Tobias Neckel) Date: Tue, 07 Dec 2010 16:17:18 +0100 Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6 Message-ID: <4CFE4FFE.9010602@in.tum.de> Hello, we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor problem in some of our test cases: We use seqaij matrix format and GMRES with ILU dt. Our code contained a statement using PCFactorSetUseDropTolerance() which does not exist any longer. So we renamed the function call to PCFactorSetDropTolerance (same signature) which has no online docu but found by google ;-) Obviously, this function has not the same functionality as our tests fail. In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available from PETSc" showed a line containing: ILU dt: ILU dt seqaij Sparsekit (table survey) This is not included in the docu of 3.1-p6 (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html ) any more. Does that mean that the ILU dt version is not supported by the (default) petsc? I also checked the changelog but did not find anything there; sorry if I missed sth. Any help or infos will be highly appreciated ;-) Thanks and best regards Tobias From zonexo at gmail.com Tue Dec 7 09:29:34 2010 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 07 Dec 2010 16:29:34 +0100 Subject: [petsc-users] Getting PETSc fortran code to compile in Eclipse with ifort In-Reply-To: <4A1257D9430643B892E650250EBC86BD@spiff> References: <4A1257D9430643B892E650250EBC86BD@spiff> Message-ID: <4CFE52DE.3050302@gmail.com> Hi, I am now trying to get my PETSc fortran code to compile and run in Eclipse photran for Linux. I am using ifort. I have a makefile which I used to compile on a linux cluster. However, I can't get it to work on Photran Eclipse. I got the error msg: **** Build of configuration Debug_Intel64 for project ibm2d_hypre **** make all make: *** No rule to make target `flux_area.o', needed by `airfoil.o'. Stop. Is there anyone with experience in this area? Thank you. Yours sincerely, TAY wee-beng From hzhang at mcs.anl.gov Tue Dec 7 10:10:26 2010 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Tue, 7 Dec 2010 10:10:26 -0600 Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6 In-Reply-To: <4CFE4FFE.9010602@in.tum.de> References: <4CFE4FFE.9010602@in.tum.de> Message-ID: Tobias : Yes, we replace it with superlu's ilu with drop tolerance. To use it, configure petsc with '--download-superlu --download-parmetis' Then run your code with option -pc_type ilu -pc_factor_mat_solver_package superlu -mat_superlu_ilu_droptol <> see available options with '-help |grep superlu' Suggest to use petsc-dev for such function, because we have fixed several bugs in superlu interface. Hong > > we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor > problem in some of our test cases: We use seqaij matrix format and GMRES > with ILU dt. > > Our code contained a statement using PCFactorSetUseDropTolerance() which > does not exist any longer. So we renamed the function call to > PCFactorSetDropTolerance (same signature) which has no online docu but found > by google ;-) Obviously, this function has not the same functionality as our > tests fail. > > In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available > from PETSc" showed a line containing: > ILU dt: ILU dt ?seqaij ?Sparsekit (table survey) > > This is not included in the docu of 3.1-p6 > (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html > ) any more. Does that mean that the ILU dt version is not supported by the > (default) petsc? I also checked the changelog but did not find anything > there; sorry if I missed sth. > > Any help or infos will be highly appreciated ;-) > > Thanks and best regards > Tobias > From vijay.m at gmail.com Tue Dec 7 13:21:37 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 7 Dec 2010 13:21:37 -0600 Subject: [petsc-users] Is PCMG a generic PC object ? In-Reply-To: References: Message-ID: Jed, that worked perfectly. I had a hunch this is what's needed but glad to see all issues resolved. Again, thanks for the help. Vijay On Tue, Dec 7, 2010 at 4:07 AM, Jed Brown wrote: > On Tue, Dec 7, 2010 at 07:42, Vijay S. Mahadevan wrote: >> >> When I use PCMG as a preconditioner to solve a fine grid system using >> a linear solver, how do I set the interpolation to the linear system >> being solved. i.e., my preconditioner starts at max_levels and >> hierarchy proceeds to 0 (coarsest) and my linear system is technically >> at max_levels+1. > > Set nlevels to whatever you want with PCMGSetLevels, then level=nlevels-1 is > the fine-level problem, you can set the pre-smoother on that level to PCNONE > if you want the "down" part of your cycle to skip it. > >> >> I have a vector length incompatibility since I cannot >> seem to set the prolongation to go from max_levels to max_levels+1, >> where the linear solver subspace resides. > > The finest level in PCMG should be the space that your KSP works in. ?The > interpolation operator on that level maps from the next coarsest level, i.e. > MatInterpolate(level[n].restrict,level[n-1].x,level[n].x); > >> >> I thought of adding the original fine grid problem to the PCMG levels >> (now max_levels=max_levels+1) but this by philosophy uses PCMG itself >> as a solver. Does it not ? > > No, it's still a preconditioner. ?"Multigrid as a solver" just means > "Richardson preconditioned by multigrid". > Jed From m.skates82 at gmail.com Tue Dec 7 15:45:02 2010 From: m.skates82 at gmail.com (Nunion) Date: Tue, 7 Dec 2010 15:45:02 -0600 Subject: [petsc-users] MatMAIJ tests Message-ID: Hello all, What would be a fix to get around the issue with complex numbers for ex100.c: Tests vatious routines in MatMAIJ formatin the src/mat/examples/tests/ directory. Currently it is not written for complex numbers. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Tue Dec 7 15:50:57 2010 From: jed at 59A2.org (Jed Brown) Date: Tue, 7 Dec 2010 22:50:57 +0100 Subject: [petsc-users] MatMAIJ tests In-Reply-To: References: Message-ID: On Tue, Dec 7, 2010 at 22:45, Nunion wrote: > What would be a fix to get around the issue with complex numbers for ex100.c: > Tests vatious routines in MatMAIJ formatin the src/mat/examples/tests/ directory. > Currently it is not written for complex numbers. > The code in that test would not have to change, but the on-disk binary format would need to change so that real matrices could be loaded. Or a sister file would need to be written that did use complex. Why do you want to run this test with complex? Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.skates82 at gmail.com Tue Dec 7 20:52:20 2010 From: m.skates82 at gmail.com (Nunion) Date: Tue, 7 Dec 2010 20:52:20 -0600 Subject: [petsc-users] MatMAIJ tests In-Reply-To: References: Message-ID: Yes, creating a sister file is the approach I am taking. I am interested in alternate approaches (implementations) for the matrix multi-vector multiply for improving performance. I read that this can be achieved using the MatMult function on matrices created with MatCreateMAIJ. My problem happens to be complex. On Tue, Dec 7, 2010 at 3:50 PM, Jed Brown wrote: > On Tue, Dec 7, 2010 at 22:45, Nunion wrote: > >> What would be a fix to get around the issue with complex numbers for ex100.c: >> Tests vatious routines in MatMAIJ formatin the src/mat/examples/tests/ directory. >> Currently it is not written for complex numbers. >> > > The code in that test would not have to change, but the on-disk binary > format would need to change so that real matrices could be loaded. Or a > sister file would need to be written that did use complex. Why do you want > to run this test with complex? > > Jed > -------------- next part -------------- An HTML attachment was scrubbed... URL: From neckel at in.tum.de Wed Dec 8 02:08:48 2010 From: neckel at in.tum.de (Tobias Neckel) Date: Wed, 08 Dec 2010 09:08:48 +0100 Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6 In-Reply-To: References: <4CFE4FFE.9010602@in.tum.de> Message-ID: <4CFF3D10.6000600@in.tum.de> Dear Hong, thanks for your quick answer. > Yes, we replace it with superlu's ilu with drop tolerance. > To use it, configure petsc with '--download-superlu --download-parmetis' > Then run your code with option > -pc_type ilu -pc_factor_mat_solver_package superlu -mat_superlu_ilu_droptol<> > > see available options with '-help |grep superlu' Ah, ok. Is there a possibility to hardwire that in the source code? We run different integration tests with different solver/pc combinations with one single executable call (currently without any options). > Suggest to use petsc-dev for such function, because we have fixed > several bugs in superlu interface. Ok, thanks. Will this functionality be available also in the next release (p7) and when is this expected (approximately)? Thanks and best regards Tobias >> we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor >> problem in some of our test cases: We use seqaij matrix format and GMRES >> with ILU dt. >> >> Our code contained a statement using PCFactorSetUseDropTolerance() which >> does not exist any longer. So we renamed the function call to >> PCFactorSetDropTolerance (same signature) which has no online docu but found >> by google ;-) Obviously, this function has not the same functionality as our >> tests fail. >> >> In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available >> from PETSc" showed a line containing: >> ILU dt: ILU dt seqaij Sparsekit (table survey) >> >> This is not included in the docu of 3.1-p6 >> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html >> ) any more. Does that mean that the ILU dt version is not supported by the >> (default) petsc? I also checked the changelog but did not find anything >> there; sorry if I missed sth. >> >> Any help or infos will be highly appreciated ;-) >> >> Thanks and best regards >> Tobias >> From jakub.pola at gmail.com Thu Dec 9 13:44:22 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Thu, 09 Dec 2010 20:44:22 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! Message-ID: <1291923862.2227.14.camel@desktop> Hello, I was trying to use pets library with my GPU (GTX 460 ) and see how it works. Unfortunatelly I cant run any example with -vec_type cuda. Thisis my log from executing ./ex2 -vec_type cuda [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing external package needed for type seehttp://www.mcs.anl.gov/petsc/petsc-as/documentation/installation.html#external! [0]PETSC ERROR: Unknown vector type: cuda! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 6, Tue Nov 16 17:02:32 CST 2010 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./ex2 on a linux-gnu named desktop by kuba Thu Dec 9 20:20:38 2010 [0]PETSC ERROR: Libraries linked from /home/kuba/External/petsc-3.1-p6/linux-gnu-c-debug/lib [0]PETSC ERROR: Configure run at Thu Dec 9 19:50:31 2010 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: VecSetType() line 46 in src/vec/vec/interface/vecreg.c [0]PETSC ERROR: VecSetTypeFromOptions_Private() line 1335 in src/vec/vec/interface/vector.c [0]PETSC ERROR: VecSetFromOptions() line 1372 in src/vec/vec/interface/vector.c [0]PETSC ERROR: main() line 127 in src/ksp/ksp/examples/tutorials/ex2.c application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0[unset]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0 I configured my pets with following command: petsc-3.1-p6$ ./config/configure.py --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no Everything went ok. Then I compiled the library with make PETSC_DIR=/home/kuba/External/petsc-3.1-p6 PETSC_ARCH=linux-gnu-c-debug all and after that tests make PETSC_DIR=/home/kuba/External/petsc-3.1-p6 PETSC_ARCH=linux-gnu-c-debug test Tests were performed successfully. I have cuda configured in LD_LIBRARY_PATH: ldconfig -p | grep cuda libicudata.so.42 (ELF) => /usr/lib/libicudata.so.42 libcusparse.so.3 (libc6) => /usr/local/cuda/lib/libcusparse.so.3 libcusparse.so (libc6) => /usr/local/cuda/lib/libcusparse.so libcurand.so.3 (libc6) => /usr/local/cuda/lib/libcurand.so.3 libcurand.so (libc6) => /usr/local/cuda/lib/libcurand.so libcufft.so.3 (libc6) => /usr/local/cuda/lib/libcufft.so.3 libcufft.so (libc6) => /usr/local/cuda/lib/libcufft.so libcudart.so.3 (libc6) => /usr/local/cuda/lib/libcudart.so.3 libcudart.so (libc6) => /usr/local/cuda/lib/libcudart.so libcuda.so.1 (libc6) => /usr/lib/nvidia-current/libcuda.so.1 libcuda.so (libc6) => /usr/lib/nvidia-current/libcuda.so libcublas.so.3 (libc6) => /usr/local/cuda/lib/libcublas.so.3 libcublas.so (libc6) => /usr/local/cuda/lib/libcublas.so and I have cuda directories in PATH: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/kuba/bin:/usr/local/cuda/bin:/usr/local/cuda/include:/usr/local/cuda/lib:/usr/local/cuda/lib64" Could you please help me with that. My final destination is to run CG and BICGSTAB solvers and measure performance using sparse matrices. Thank you in advance for your help. Best regards, Jakub. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Thu Dec 9 13:47:27 2010 From: jed at 59A2.org (Jed Brown) Date: Thu, 9 Dec 2010 20:47:27 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: <1291923862.2227.14.camel@desktop> References: <1291923862.2227.14.camel@desktop> Message-ID: On Thu, Dec 9, 2010 at 20:44, Jakub Pola wrote: > Petsc Release Version 3.1.0 You need petsc-dev for this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.pola at gmail.com Thu Dec 9 17:07:45 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Fri, 10 Dec 2010 00:07:45 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: References: <1291923862.2227.14.camel@desktop> Message-ID: <1291936065.2227.31.camel@desktop> Hi again, I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0 I have problems with compiling the library. I did following steps: configured pets: ./config/configure.py --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust then make: make PETSC_DIR=/home/kuba/External/petsc-dev PETSC_ARCH=arch-linux-gnu-c-debug all here I have problems because compilator says that /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal error: iterator: No such file or directory But actually it is as a symbolic link: ls -l /usr/local/cuda/include/ gives lrwxrwxrwx 1 root root 34 2010-12-09 23:44 thrust -> /home/kuba/External/thrust/thrust/ and kuba at desktop:~/External/thrust/thrust/iterator$ ls -l razem 96 -rw-r--r-- 1 kuba kuba 7666 2010-12-09 21:30 constant_iterator.h -rw-r--r-- 1 kuba kuba 6959 2010-12-09 21:30 counting_iterator.h drwxr-xr-x 3 kuba kuba 4096 2010-12-09 21:30 detail -rw-r--r-- 1 kuba kuba 4376 2010-12-09 21:30 iterator_adaptor.h -rw-r--r-- 1 kuba kuba 8282 2010-12-09 21:30 iterator_categories.h -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h -rw-r--r-- 1 kuba kuba 2066 2010-12-09 21:30 iterator_traits.h -rw-r--r-- 1 kuba kuba 6880 2010-12-09 21:30 permutation_iterator.h -rw-r--r-- 1 kuba kuba 7055 2010-12-09 21:30 reverse_iterator.h -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h -rw-r--r-- 1 kuba kuba 7348 2010-12-09 21:30 zip_iterator.h kuba at desktop:~/External/thrust/thrust/iterator$ Have somebody faced this kind of problem? Here it is compilation log to first error kuba at desktop:~/External/petsc-dev$ make PETSC_DIR=/home/kuba/External/petsc-dev PETSC_ARCH=arch-linux-gnu-c-debug all ========================================== See documentation/faq.html and documentation/bugreporting.html for help with installation problems. Please send EVERYTHING printed out below when reporting problems To subscribe to the PETSc announcement list, send mail to majordomo at mcs.anl.gov with the message: subscribe petsc-announce To subscribe to the PETSc users mailing list, send mail to majordomo at mcs.anl.gov with the message: subscribe petsc-users ========================================== On czw, 9 gru 2010, 23:56:38 CET on desktop Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux ----------------------------------------- Using PETSc directory: /home/kuba/External/petsc-dev Using PETSc arch: arch-linux-gnu-c-debug ----------------------------------------- PETSC_VERSION_RELEASE 0 PETSC_VERSION_MAJOR 3 PETSC_VERSION_MINOR 1 PETSC_VERSION_SUBMINOR 0 PETSC_VERSION_PATCH 6 PETSC_VERSION_DATE "Mar, 25, 2010" PETSC_VERSION_PATCH_DATE "unknown" PETSC_VERSION_HG "unknown" PETSC_VERSION_DATE_HG "unknown" PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \ ----------------------------------------- Using configure Options: --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust Using configuration flags: #define INCLUDED_PETSCCONF_H #define IS_COLORING_MAX 65535 #define STDC_HEADERS 1 #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT #define PETSC_UINTPTR_T uintptr_t #define PETSC_HAVE_PTHREAD 1 #define PETSC_STATIC_INLINE static inline #define PETSC_REPLACE_DIR_SEPARATOR '\\' #define PETSC_RESTRICT __restrict__ #define PETSC_HAVE_MPI 1 #define PETSC_USE_SINGLE_LIBRARY 1 #define PETSC_USE_SOCKET_VIEWER 1 #define PETSC_HAVE_THRUST 1 #define PETSC_LIB_DIR "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib" #define PETSC_HAVE_FORTRAN 1 #define PETSC_HAVE_SOWING 1 #define PETSC_SLSUFFIX "" #define PETSC_FUNCTION_NAME_CXX __func__ #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1 #define PETSC_UNUSED #define PETSC_HAVE_CUDA 1 #define PETSC_FUNCTION_NAME_C __func__ #define PETSC_HAVE_C2HTML 1 #define PETSC_HAVE_VALGRIND 1 #define PETSC_HAVE_BUILTIN_EXPECT 1 #define PETSC_DIR_SEPARATOR '/' #define PETSC_PATH_SEPARATOR ':' #define PETSC_HAVE_X11 1 #define PETSC_HAVE_CUSP 1 #define PETSC_Prefetch(a,b,c) #define PETSC_HAVE_BLASLAPACK 1 #define PETSC_HAVE_STRING_H 1 #define PETSC_HAVE_SYS_TYPES_H 1 #define PETSC_HAVE_ENDIAN_H 1 #define PETSC_HAVE_SYS_PROCFS_H 1 #define PETSC_HAVE_DLFCN_H 1 #define PETSC_HAVE_STDINT_H 1 #define PETSC_HAVE_LINUX_KERNEL_H 1 #define PETSC_HAVE_TIME_H 1 #define PETSC_HAVE_MATH_H 1 #define PETSC_HAVE_STDLIB_H 1 #define PETSC_HAVE_SYS_PARAM_H 1 #define PETSC_HAVE_SYS_SOCKET_H 1 #define PETSC_HAVE_UNISTD_H 1 #define PETSC_HAVE_SYS_WAIT_H 1 #define PETSC_HAVE_LIMITS_H 1 #define PETSC_HAVE_SYS_UTSNAME_H 1 #define PETSC_HAVE_NETINET_IN_H 1 #define PETSC_HAVE_FENV_H 1 #define PETSC_HAVE_FLOAT_H 1 #define PETSC_HAVE_SEARCH_H 1 #define PETSC_HAVE_SYS_SYSINFO_H 1 #define PETSC_HAVE_SYS_RESOURCE_H 1 #define PETSC_HAVE_SYS_TIMES_H 1 #define PETSC_HAVE_NETDB_H 1 #define PETSC_HAVE_MALLOC_H 1 #define PETSC_HAVE_PWD_H 1 #define PETSC_HAVE_FCNTL_H 1 #define PETSC_HAVE_STRINGS_H 1 #define PETSC_HAVE_MEMORY_H 1 #define PETSC_TIME_WITH_SYS_TIME 1 #define PETSC_HAVE_SYS_TIME_H 1 #define PETSC_USING_F90 1 #define PETSC_HAVE_RTLD_NOW 1 #define PETSC_HAVE_RTLD_LOCAL 1 #define PETSC_HAVE_RTLD_LAZY 1 #define PETSC_C_STATIC_INLINE static inline #define PETSC_HAVE_FORTRAN_UNDERSCORE 1 #define PETSC_HAVE_CXX_NAMESPACE 1 #define PETSC_HAVE_RTLD_GLOBAL 1 #define PETSC_C_RESTRICT __restrict__ #define PETSC_CXX_RESTRICT __restrict__ #define PETSC_CXX_STATIC_INLINE static inline #define PETSC_HAVE_LIBCUBLAS 1 #define PETSC_HAVE_LIBCUDART 1 #define PETSC_HAVE_LIBDL 1 #define PETSC_HAVE_LIBFBLAS 1 #define PETSC_HAVE_LIBFLAPACK 1 #define PETSC_HAVE_ERF 1 #define PETSC_HAVE_LIBCUFFT 1 #define PETSC_HAVE_LIBRT 1 #define PETSC_ARCH "arch-linux-gnu-c-debug" #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100" #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6" #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e" #define PETSC_DIR "/home/kuba/External/petsc-dev" #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600" #define HAVE_GZIP 1 #define PETSC_CLANGUAGE_C 1 #define PETSC_USE_EXTERN_CXX #define PETSC_USE_ERRORCHECKING 1 #define PETSC_MISSING_DREAL 1 #define PETSC_SIZEOF_MPI_COMM 4 #define PETSC_BITS_PER_BYTE 8 #define PETSC_SIZEOF_MPI_FINT 4 #define PETSC_SIZEOF_VOID_P 4 #define PETSC_RETSIGTYPE void #define PETSC_HAVE_CXX_COMPLEX 1 #define PETSC_SIZEOF_LONG 4 #define PETSC_USE_FORTRANKIND 1 #define PETSC_SIZEOF_SIZE_T 4 #define PETSC_SIZEOF_CHAR 1 #define PETSC_SIZEOF_DOUBLE 8 #define PETSC_SIZEOF_FLOAT 4 #define PETSC_HAVE_C99_COMPLEX 1 #define PETSC_SIZEOF_INT 4 #define PETSC_SIZEOF_LONG_LONG 8 #define PETSC_SIZEOF_SHORT 2 #define PETSC_HAVE_STRCASECMP 1 #define PETSC_HAVE_POPEN 1 #define PETSC_HAVE_SIGSET 1 #define PETSC_HAVE_GETWD 1 #define PETSC_HAVE_VSNPRINTF 1 #define PETSC_HAVE_TIMES 1 #define PETSC_HAVE_DLSYM 1 #define PETSC_HAVE_SNPRINTF 1 #define PETSC_HAVE_GETPWUID 1 #define PETSC_HAVE_GETHOSTBYNAME 1 #define PETSC_HAVE_SLEEP 1 #define PETSC_HAVE_DLERROR 1 #define PETSC_HAVE_FORK 1 #define PETSC_HAVE_RAND 1 #define PETSC_HAVE_GETTIMEOFDAY 1 #define PETSC_HAVE_DLCLOSE 1 #define PETSC_HAVE_UNAME 1 #define PETSC_HAVE_GETHOSTNAME 1 #define PETSC_HAVE_MKSTEMP 1 #define PETSC_HAVE_SIGACTION 1 #define PETSC_HAVE_DRAND48 1 #define PETSC_HAVE_NANOSLEEP 1 #define PETSC_HAVE_VA_COPY 1 #define PETSC_HAVE_CLOCK 1 #define PETSC_HAVE_ACCESS 1 #define PETSC_HAVE_SIGNAL 1 #define PETSC_HAVE_USLEEP 1 #define PETSC_HAVE_GETRUSAGE 1 #define PETSC_HAVE_VFPRINTF 1 #define PETSC_HAVE_MEMALIGN 1 #define PETSC_HAVE_GETDOMAINNAME 1 #define PETSC_HAVE_TIME 1 #define PETSC_HAVE_LSEEK 1 #define PETSC_HAVE_SOCKET 1 #define PETSC_HAVE_SYSINFO 1 #define PETSC_HAVE_READLINK 1 #define PETSC_HAVE_REALPATH 1 #define PETSC_HAVE_DLOPEN 1 #define PETSC_HAVE_MEMMOVE 1 #define PETSC_HAVE__GFORTRAN_IARGC 1 #define PETSC_SIGNAL_CAST #define PETSC_HAVE_GETCWD 1 #define PETSC_HAVE_VPRINTF 1 #define PETSC_HAVE_BZERO 1 #define PETSC_HAVE_GETPAGESIZE 1 #define PETSC_LEVEL1_DCACHE_LINESIZE 64 #define PETSC_LEVEL1_DCACHE_SIZE 32768 #define PETSC_LEVEL1_DCACHE_ASSOC 8 #define PETSC_USE_PROC_FOR_SIZE 1 #define PETSC_HAVE_DYNAMIC_LIBRARIES 1 #define PETSC_HAVE_SHARED_LIBRARIES 1 #define PETSC_MEMALIGN 16 #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1 #define PETSC_HAVE_GFORTRAN_IARGC 1 #define PETSC_HAVE_ISINF 1 #define PETSC_HAVE_ISNAN 1 #define PETSC_HAVE_MPI_COMM_C2F 1 #define PETSC_HAVE_MPI_LONG_DOUBLE 1 #define PETSC_HAVE_MPI_COMM_F2C 1 #define PETSC_HAVE_MPI_FINT 1 #define PETSC_HAVE_MPI_F90MODULE 1 #define PETSC_HAVE_MPI_FINALIZED 1 #define PETSC_HAVE_MPI_COMM_SPAWN 1 #define PETSC_HAVE_MPI_WIN_CREATE 1 #define PETSC_HAVE_MPIIO 1 #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1 #define PETSC_HAVE_MPI_ALLTOALLW 1 #define PETSC_HAVE_MPI_IN_PLACE 1 #define PETSC_USE_INFO 1 #define PETSC_PETSC_USE_BACKWARD_LOOP 1 #define PETSC_Alignx(a,b) #define PETSC_USE_DEBUG 1 #define PETSC_USE_LOG 1 #define PETSC_IS_COLOR_VALUE_TYPE short #define PETSC_USE_CTABLE 1 #define PETSC_USE_GDB_DEBUGGER 1 #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" { #define PETSC_CUDA_EXTERN_C_END } #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1 #define PETSC_BLASLAPACK_UNDERSCORE 1 ----------------------------------------- Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ -I/usr/local/cuda/include/thrust/ Using C/C++ compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 Using Fortran include/module paths: -I/home/kuba/External/petsc-dev/include -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ -I/usr/local/cuda/include/thrust/ Using Fortran compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g ----------------------------------------- Using C/C++ linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 Using Fortran linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 Using Fortran flags: -Wall -Wno-unused-variable -g ----------------------------------------- Using libraries: -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib -lpetsc -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft -lcublas -lcudart -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5 -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl ------------------------------------------ Using mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec ========================================== /bin/rm -f -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.* /bin/rm -f -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES ========================================= libfast in: /home/kuba/External/petsc-dev/src libfast in: /home/kuba/External/petsc-dev/src/inline libfast in: /home/kuba/External/petsc-dev/src/sys libfast in: /home/kuba/External/petsc-dev/src/sys/viewer libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket In file included from /usr/local/cuda/include/cusp/detail/config.h:24, from /usr/local/cuda/include/cusp/memory.h:20, from /home/kuba/External/petsc-dev/include/petscsys.h:1671, from send.c:3: /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?thrust? In file included from /usr/local/cuda/include/cusp/memory.h:22, from /home/kuba/External/petsc-dev/include/petscsys.h:1671, from send.c:3: /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal error: iterator: No such file or directory compilation terminated. Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze: > On Thu, Dec 9, 2010 at 20:44, Jakub Pola wrote: > Petsc Release Version 3.1.0 > > You need petsc-dev for this. From balay at mcs.anl.gov Thu Dec 9 17:29:11 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 9 Dec 2010 17:29:11 -0600 (CST) Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: <1291936065.2227.31.camel@desktop> References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> Message-ID: Try removing options: --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust [and use --with-cups=1 --with-thrust=1] As they are in default location - configure picks them automatically. But since they are incorrectly specified - the 'default cuda path' gets the configure tests going - but this path causes grief later.. > In file included from /usr/local/cuda/include/cusp/detail/config.h:24, > from /usr/local/cuda/include/cusp/memory.h:20, Thats suspporsed to system memory.h - not cups/memory.h Satish On Fri, 10 Dec 2010, Jakub Pola wrote: > Hi again, > > I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0 > I have problems with compiling the library. I did following steps: > > configured pets: > > ./config/configure.py --with-cc=gcc --with-fc=gfortran > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ > --with-thrust-dir=/usr/local/cuda/include/thrust > > then make: > make PETSC_DIR=/home/kuba/External/petsc-dev > PETSC_ARCH=arch-linux-gnu-c-debug all > > here I have problems because compilator says that > /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal > error: iterator: No such file or directory > > But actually it is as a symbolic link: > ls -l /usr/local/cuda/include/ > gives > lrwxrwxrwx 1 root root 34 2010-12-09 23:44 thrust > -> /home/kuba/External/thrust/thrust/ > > and > > kuba at desktop:~/External/thrust/thrust/iterator$ ls -l > razem 96 > -rw-r--r-- 1 kuba kuba 7666 2010-12-09 21:30 constant_iterator.h > -rw-r--r-- 1 kuba kuba 6959 2010-12-09 21:30 counting_iterator.h > drwxr-xr-x 3 kuba kuba 4096 2010-12-09 21:30 detail > -rw-r--r-- 1 kuba kuba 4376 2010-12-09 21:30 iterator_adaptor.h > -rw-r--r-- 1 kuba kuba 8282 2010-12-09 21:30 iterator_categories.h > -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h > -rw-r--r-- 1 kuba kuba 2066 2010-12-09 21:30 iterator_traits.h > -rw-r--r-- 1 kuba kuba 6880 2010-12-09 21:30 permutation_iterator.h > -rw-r--r-- 1 kuba kuba 7055 2010-12-09 21:30 reverse_iterator.h > -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h > -rw-r--r-- 1 kuba kuba 7348 2010-12-09 21:30 zip_iterator.h > kuba at desktop:~/External/thrust/thrust/iterator$ > > > Have somebody faced this kind of problem? > > > > > Here it is compilation log to first error > > kuba at desktop:~/External/petsc-dev$ make > PETSC_DIR=/home/kuba/External/petsc-dev > PETSC_ARCH=arch-linux-gnu-c-debug all > ========================================== > > See documentation/faq.html and documentation/bugreporting.html > for help with installation problems. Please send EVERYTHING > printed out below when reporting problems > > To subscribe to the PETSc announcement list, send mail to > majordomo at mcs.anl.gov with the message: > subscribe petsc-announce > > To subscribe to the PETSc users mailing list, send mail to > majordomo at mcs.anl.gov with the message: > subscribe petsc-users > > ========================================== > On czw, 9 gru 2010, 23:56:38 CET on desktop > Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP > Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux > ----------------------------------------- > Using PETSc directory: /home/kuba/External/petsc-dev > Using PETSc arch: arch-linux-gnu-c-debug > ----------------------------------------- > PETSC_VERSION_RELEASE 0 > PETSC_VERSION_MAJOR 3 > PETSC_VERSION_MINOR 1 > PETSC_VERSION_SUBMINOR 0 > PETSC_VERSION_PATCH 6 > PETSC_VERSION_DATE "Mar, 25, 2010" > PETSC_VERSION_PATCH_DATE "unknown" > PETSC_VERSION_HG "unknown" > PETSC_VERSION_DATE_HG "unknown" > PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \ > ----------------------------------------- > Using configure Options: --with-cc=gcc --with-fc=gfortran > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ > --with-thrust-dir=/usr/local/cuda/include/thrust > Using configuration flags: > #define INCLUDED_PETSCCONF_H > #define IS_COLORING_MAX 65535 > #define STDC_HEADERS 1 > #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT > #define PETSC_UINTPTR_T uintptr_t > #define PETSC_HAVE_PTHREAD 1 > #define PETSC_STATIC_INLINE static inline > #define PETSC_REPLACE_DIR_SEPARATOR '\\' > #define PETSC_RESTRICT __restrict__ > #define PETSC_HAVE_MPI 1 > #define PETSC_USE_SINGLE_LIBRARY 1 > #define PETSC_USE_SOCKET_VIEWER 1 > #define PETSC_HAVE_THRUST 1 > #define PETSC_LIB_DIR > "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib" > #define PETSC_HAVE_FORTRAN 1 > #define PETSC_HAVE_SOWING 1 > #define PETSC_SLSUFFIX "" > #define PETSC_FUNCTION_NAME_CXX __func__ > #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1 > #define PETSC_UNUSED > #define PETSC_HAVE_CUDA 1 > #define PETSC_FUNCTION_NAME_C __func__ > #define PETSC_HAVE_C2HTML 1 > #define PETSC_HAVE_VALGRIND 1 > #define PETSC_HAVE_BUILTIN_EXPECT 1 > #define PETSC_DIR_SEPARATOR '/' > #define PETSC_PATH_SEPARATOR ':' > #define PETSC_HAVE_X11 1 > #define PETSC_HAVE_CUSP 1 > #define PETSC_Prefetch(a,b,c) > #define PETSC_HAVE_BLASLAPACK 1 > #define PETSC_HAVE_STRING_H 1 > #define PETSC_HAVE_SYS_TYPES_H 1 > #define PETSC_HAVE_ENDIAN_H 1 > #define PETSC_HAVE_SYS_PROCFS_H 1 > #define PETSC_HAVE_DLFCN_H 1 > #define PETSC_HAVE_STDINT_H 1 > #define PETSC_HAVE_LINUX_KERNEL_H 1 > #define PETSC_HAVE_TIME_H 1 > #define PETSC_HAVE_MATH_H 1 > #define PETSC_HAVE_STDLIB_H 1 > #define PETSC_HAVE_SYS_PARAM_H 1 > #define PETSC_HAVE_SYS_SOCKET_H 1 > #define PETSC_HAVE_UNISTD_H 1 > #define PETSC_HAVE_SYS_WAIT_H 1 > #define PETSC_HAVE_LIMITS_H 1 > #define PETSC_HAVE_SYS_UTSNAME_H 1 > #define PETSC_HAVE_NETINET_IN_H 1 > #define PETSC_HAVE_FENV_H 1 > #define PETSC_HAVE_FLOAT_H 1 > #define PETSC_HAVE_SEARCH_H 1 > #define PETSC_HAVE_SYS_SYSINFO_H 1 > #define PETSC_HAVE_SYS_RESOURCE_H 1 > #define PETSC_HAVE_SYS_TIMES_H 1 > #define PETSC_HAVE_NETDB_H 1 > #define PETSC_HAVE_MALLOC_H 1 > #define PETSC_HAVE_PWD_H 1 > #define PETSC_HAVE_FCNTL_H 1 > #define PETSC_HAVE_STRINGS_H 1 > #define PETSC_HAVE_MEMORY_H 1 > #define PETSC_TIME_WITH_SYS_TIME 1 > #define PETSC_HAVE_SYS_TIME_H 1 > #define PETSC_USING_F90 1 > #define PETSC_HAVE_RTLD_NOW 1 > #define PETSC_HAVE_RTLD_LOCAL 1 > #define PETSC_HAVE_RTLD_LAZY 1 > #define PETSC_C_STATIC_INLINE static inline > #define PETSC_HAVE_FORTRAN_UNDERSCORE 1 > #define PETSC_HAVE_CXX_NAMESPACE 1 > #define PETSC_HAVE_RTLD_GLOBAL 1 > #define PETSC_C_RESTRICT __restrict__ > #define PETSC_CXX_RESTRICT __restrict__ > #define PETSC_CXX_STATIC_INLINE static inline > #define PETSC_HAVE_LIBCUBLAS 1 > #define PETSC_HAVE_LIBCUDART 1 > #define PETSC_HAVE_LIBDL 1 > #define PETSC_HAVE_LIBFBLAS 1 > #define PETSC_HAVE_LIBFLAPACK 1 > #define PETSC_HAVE_ERF 1 > #define PETSC_HAVE_LIBCUFFT 1 > #define PETSC_HAVE_LIBRT 1 > #define PETSC_ARCH "arch-linux-gnu-c-debug" > #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100" > #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6" > #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e" > #define PETSC_DIR "/home/kuba/External/petsc-dev" > #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600" > #define HAVE_GZIP 1 > #define PETSC_CLANGUAGE_C 1 > #define PETSC_USE_EXTERN_CXX > #define PETSC_USE_ERRORCHECKING 1 > #define PETSC_MISSING_DREAL 1 > #define PETSC_SIZEOF_MPI_COMM 4 > #define PETSC_BITS_PER_BYTE 8 > #define PETSC_SIZEOF_MPI_FINT 4 > #define PETSC_SIZEOF_VOID_P 4 > #define PETSC_RETSIGTYPE void > #define PETSC_HAVE_CXX_COMPLEX 1 > #define PETSC_SIZEOF_LONG 4 > #define PETSC_USE_FORTRANKIND 1 > #define PETSC_SIZEOF_SIZE_T 4 > #define PETSC_SIZEOF_CHAR 1 > #define PETSC_SIZEOF_DOUBLE 8 > #define PETSC_SIZEOF_FLOAT 4 > #define PETSC_HAVE_C99_COMPLEX 1 > #define PETSC_SIZEOF_INT 4 > #define PETSC_SIZEOF_LONG_LONG 8 > #define PETSC_SIZEOF_SHORT 2 > #define PETSC_HAVE_STRCASECMP 1 > #define PETSC_HAVE_POPEN 1 > #define PETSC_HAVE_SIGSET 1 > #define PETSC_HAVE_GETWD 1 > #define PETSC_HAVE_VSNPRINTF 1 > #define PETSC_HAVE_TIMES 1 > #define PETSC_HAVE_DLSYM 1 > #define PETSC_HAVE_SNPRINTF 1 > #define PETSC_HAVE_GETPWUID 1 > #define PETSC_HAVE_GETHOSTBYNAME 1 > #define PETSC_HAVE_SLEEP 1 > #define PETSC_HAVE_DLERROR 1 > #define PETSC_HAVE_FORK 1 > #define PETSC_HAVE_RAND 1 > #define PETSC_HAVE_GETTIMEOFDAY 1 > #define PETSC_HAVE_DLCLOSE 1 > #define PETSC_HAVE_UNAME 1 > #define PETSC_HAVE_GETHOSTNAME 1 > #define PETSC_HAVE_MKSTEMP 1 > #define PETSC_HAVE_SIGACTION 1 > #define PETSC_HAVE_DRAND48 1 > #define PETSC_HAVE_NANOSLEEP 1 > #define PETSC_HAVE_VA_COPY 1 > #define PETSC_HAVE_CLOCK 1 > #define PETSC_HAVE_ACCESS 1 > #define PETSC_HAVE_SIGNAL 1 > #define PETSC_HAVE_USLEEP 1 > #define PETSC_HAVE_GETRUSAGE 1 > #define PETSC_HAVE_VFPRINTF 1 > #define PETSC_HAVE_MEMALIGN 1 > #define PETSC_HAVE_GETDOMAINNAME 1 > #define PETSC_HAVE_TIME 1 > #define PETSC_HAVE_LSEEK 1 > #define PETSC_HAVE_SOCKET 1 > #define PETSC_HAVE_SYSINFO 1 > #define PETSC_HAVE_READLINK 1 > #define PETSC_HAVE_REALPATH 1 > #define PETSC_HAVE_DLOPEN 1 > #define PETSC_HAVE_MEMMOVE 1 > #define PETSC_HAVE__GFORTRAN_IARGC 1 > #define PETSC_SIGNAL_CAST > #define PETSC_HAVE_GETCWD 1 > #define PETSC_HAVE_VPRINTF 1 > #define PETSC_HAVE_BZERO 1 > #define PETSC_HAVE_GETPAGESIZE 1 > #define PETSC_LEVEL1_DCACHE_LINESIZE 64 > #define PETSC_LEVEL1_DCACHE_SIZE 32768 > #define PETSC_LEVEL1_DCACHE_ASSOC 8 > #define PETSC_USE_PROC_FOR_SIZE 1 > #define PETSC_HAVE_DYNAMIC_LIBRARIES 1 > #define PETSC_HAVE_SHARED_LIBRARIES 1 > #define PETSC_MEMALIGN 16 > #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1 > #define PETSC_HAVE_GFORTRAN_IARGC 1 > #define PETSC_HAVE_ISINF 1 > #define PETSC_HAVE_ISNAN 1 > #define PETSC_HAVE_MPI_COMM_C2F 1 > #define PETSC_HAVE_MPI_LONG_DOUBLE 1 > #define PETSC_HAVE_MPI_COMM_F2C 1 > #define PETSC_HAVE_MPI_FINT 1 > #define PETSC_HAVE_MPI_F90MODULE 1 > #define PETSC_HAVE_MPI_FINALIZED 1 > #define PETSC_HAVE_MPI_COMM_SPAWN 1 > #define PETSC_HAVE_MPI_WIN_CREATE 1 > #define PETSC_HAVE_MPIIO 1 > #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1 > #define PETSC_HAVE_MPI_ALLTOALLW 1 > #define PETSC_HAVE_MPI_IN_PLACE 1 > #define PETSC_USE_INFO 1 > #define PETSC_PETSC_USE_BACKWARD_LOOP 1 > #define PETSC_Alignx(a,b) > #define PETSC_USE_DEBUG 1 > #define PETSC_USE_LOG 1 > #define PETSC_IS_COLOR_VALUE_TYPE short > #define PETSC_USE_CTABLE 1 > #define PETSC_USE_GDB_DEBUGGER 1 > #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" { > #define PETSC_CUDA_EXTERN_C_END } > #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1 > #define PETSC_BLASLAPACK_UNDERSCORE 1 > ----------------------------------------- > Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include > -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include > -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ > -I/usr/local/cuda/include/thrust/ > Using C/C++ > compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc > -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 > Using Fortran include/module paths: > -I/home/kuba/External/petsc-dev/include > -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include > -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ > -I/usr/local/cuda/include/thrust/ > Using Fortran > compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g > ----------------------------------------- > Using C/C++ > linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc > Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -g3 > Using Fortran > linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 > Using Fortran flags: -Wall -Wno-unused-variable -g > ----------------------------------------- > Using libraries: > -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib -lpetsc > -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft > -lcublas -lcudart > -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib > -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5 > -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread > -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx > -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl > ------------------------------------------ > Using > mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec > ========================================== > /bin/rm -f > -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.* > /bin/rm -f > -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod > BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES > ========================================= > libfast in: /home/kuba/External/petsc-dev/src > libfast in: /home/kuba/External/petsc-dev/src/inline > libfast in: /home/kuba/External/petsc-dev/src/sys > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket > In file included from /usr/local/cuda/include/cusp/detail/config.h:24, > from /usr/local/cuda/include/cusp/memory.h:20, > > from /home/kuba/External/petsc-dev/include/petscsys.h:1671, > from send.c:3: > /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?, > ?;?, ?asm? or ?__attribute__? before ?thrust? > In file included from /usr/local/cuda/include/cusp/memory.h:22, > > from /home/kuba/External/petsc-dev/include/petscsys.h:1671, > from send.c:3: > /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal > error: iterator: No such file or directory > compilation terminated. > > > > Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze: > > On Thu, Dec 9, 2010 at 20:44, Jakub Pola wrote: > > Petsc Release Version 3.1.0 > > > > You need petsc-dev for this. > > > From jed at 59A2.org Thu Dec 9 17:38:52 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 10 Dec 2010 00:38:52 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: <1291936065.2227.31.camel@desktop> References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> Message-ID: On Fri, Dec 10, 2010 at 00:07, Jakub Pola wrote: > I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0 As I said before, you need petsc-dev for this, CUDA support is not in 3.1. http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.pola at gmail.com Thu Dec 9 17:44:12 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Fri, 10 Dec 2010 00:44:12 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> Message-ID: <1291938252.2227.33.camel@desktop> When I used your suggestion I got following error. kuba at desktop:~/External/petsc-dev$ ./config/configure.py --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cups=1 --with-thrust=1 =============================================================================== Configuring PETSc to compile on your system =============================================================================== TESTING: checkInclude from config.headers(config/BuildSystem/config/headers.py:82) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- PETSc CUDA support requires the CUSP and Thrust packages Rerun configure using --with-cusp-dir and --with-thrust-dir ******************************************************************************* Dnia 2010-12-09, czw o godzinie 17:29 -0600, Satish Balay pisze: > Try removing options: > --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust > [and use --with-cups=1 --with-thrust=1] > > As they are in default location - configure picks them automatically. But > since they are incorrectly specified - the 'default cuda path' gets > the configure tests going - but this path causes grief later.. > > > In file included from /usr/local/cuda/include/cusp/detail/config.h:24, > > from /usr/local/cuda/include/cusp/memory.h:20, > > > Thats suspporsed to system memory.h - not cups/memory.h > > Satish > > On Fri, 10 Dec 2010, Jakub Pola wrote: > > > Hi again, > > > > I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0 > > I have problems with compiling the library. I did following steps: > > > > configured pets: > > > > ./config/configure.py --with-cc=gcc --with-fc=gfortran > > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > > --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ > > --with-thrust-dir=/usr/local/cuda/include/thrust > > > > then make: > > make PETSC_DIR=/home/kuba/External/petsc-dev > > PETSC_ARCH=arch-linux-gnu-c-debug all > > > > here I have problems because compilator says that > > /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal > > error: iterator: No such file or directory > > > > But actually it is as a symbolic link: > > ls -l /usr/local/cuda/include/ > > gives > > lrwxrwxrwx 1 root root 34 2010-12-09 23:44 thrust > > -> /home/kuba/External/thrust/thrust/ > > > > and > > > > kuba at desktop:~/External/thrust/thrust/iterator$ ls -l > > razem 96 > > -rw-r--r-- 1 kuba kuba 7666 2010-12-09 21:30 constant_iterator.h > > -rw-r--r-- 1 kuba kuba 6959 2010-12-09 21:30 counting_iterator.h > > drwxr-xr-x 3 kuba kuba 4096 2010-12-09 21:30 detail > > -rw-r--r-- 1 kuba kuba 4376 2010-12-09 21:30 iterator_adaptor.h > > -rw-r--r-- 1 kuba kuba 8282 2010-12-09 21:30 iterator_categories.h > > -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h > > -rw-r--r-- 1 kuba kuba 2066 2010-12-09 21:30 iterator_traits.h > > -rw-r--r-- 1 kuba kuba 6880 2010-12-09 21:30 permutation_iterator.h > > -rw-r--r-- 1 kuba kuba 7055 2010-12-09 21:30 reverse_iterator.h > > -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h > > -rw-r--r-- 1 kuba kuba 7348 2010-12-09 21:30 zip_iterator.h > > kuba at desktop:~/External/thrust/thrust/iterator$ > > > > > > Have somebody faced this kind of problem? > > > > > > > > > > Here it is compilation log to first error > > > > kuba at desktop:~/External/petsc-dev$ make > > PETSC_DIR=/home/kuba/External/petsc-dev > > PETSC_ARCH=arch-linux-gnu-c-debug all > > ========================================== > > > > See documentation/faq.html and documentation/bugreporting.html > > for help with installation problems. Please send EVERYTHING > > printed out below when reporting problems > > > > To subscribe to the PETSc announcement list, send mail to > > majordomo at mcs.anl.gov with the message: > > subscribe petsc-announce > > > > To subscribe to the PETSc users mailing list, send mail to > > majordomo at mcs.anl.gov with the message: > > subscribe petsc-users > > > > ========================================== > > On czw, 9 gru 2010, 23:56:38 CET on desktop > > Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP > > Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux > > ----------------------------------------- > > Using PETSc directory: /home/kuba/External/petsc-dev > > Using PETSc arch: arch-linux-gnu-c-debug > > ----------------------------------------- > > PETSC_VERSION_RELEASE 0 > > PETSC_VERSION_MAJOR 3 > > PETSC_VERSION_MINOR 1 > > PETSC_VERSION_SUBMINOR 0 > > PETSC_VERSION_PATCH 6 > > PETSC_VERSION_DATE "Mar, 25, 2010" > > PETSC_VERSION_PATCH_DATE "unknown" > > PETSC_VERSION_HG "unknown" > > PETSC_VERSION_DATE_HG "unknown" > > PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \ > > ----------------------------------------- > > Using configure Options: --with-cc=gcc --with-fc=gfortran > > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > > --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ > > --with-thrust-dir=/usr/local/cuda/include/thrust > > Using configuration flags: > > #define INCLUDED_PETSCCONF_H > > #define IS_COLORING_MAX 65535 > > #define STDC_HEADERS 1 > > #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT > > #define PETSC_UINTPTR_T uintptr_t > > #define PETSC_HAVE_PTHREAD 1 > > #define PETSC_STATIC_INLINE static inline > > #define PETSC_REPLACE_DIR_SEPARATOR '\\' > > #define PETSC_RESTRICT __restrict__ > > #define PETSC_HAVE_MPI 1 > > #define PETSC_USE_SINGLE_LIBRARY 1 > > #define PETSC_USE_SOCKET_VIEWER 1 > > #define PETSC_HAVE_THRUST 1 > > #define PETSC_LIB_DIR > > "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib" > > #define PETSC_HAVE_FORTRAN 1 > > #define PETSC_HAVE_SOWING 1 > > #define PETSC_SLSUFFIX "" > > #define PETSC_FUNCTION_NAME_CXX __func__ > > #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1 > > #define PETSC_UNUSED > > #define PETSC_HAVE_CUDA 1 > > #define PETSC_FUNCTION_NAME_C __func__ > > #define PETSC_HAVE_C2HTML 1 > > #define PETSC_HAVE_VALGRIND 1 > > #define PETSC_HAVE_BUILTIN_EXPECT 1 > > #define PETSC_DIR_SEPARATOR '/' > > #define PETSC_PATH_SEPARATOR ':' > > #define PETSC_HAVE_X11 1 > > #define PETSC_HAVE_CUSP 1 > > #define PETSC_Prefetch(a,b,c) > > #define PETSC_HAVE_BLASLAPACK 1 > > #define PETSC_HAVE_STRING_H 1 > > #define PETSC_HAVE_SYS_TYPES_H 1 > > #define PETSC_HAVE_ENDIAN_H 1 > > #define PETSC_HAVE_SYS_PROCFS_H 1 > > #define PETSC_HAVE_DLFCN_H 1 > > #define PETSC_HAVE_STDINT_H 1 > > #define PETSC_HAVE_LINUX_KERNEL_H 1 > > #define PETSC_HAVE_TIME_H 1 > > #define PETSC_HAVE_MATH_H 1 > > #define PETSC_HAVE_STDLIB_H 1 > > #define PETSC_HAVE_SYS_PARAM_H 1 > > #define PETSC_HAVE_SYS_SOCKET_H 1 > > #define PETSC_HAVE_UNISTD_H 1 > > #define PETSC_HAVE_SYS_WAIT_H 1 > > #define PETSC_HAVE_LIMITS_H 1 > > #define PETSC_HAVE_SYS_UTSNAME_H 1 > > #define PETSC_HAVE_NETINET_IN_H 1 > > #define PETSC_HAVE_FENV_H 1 > > #define PETSC_HAVE_FLOAT_H 1 > > #define PETSC_HAVE_SEARCH_H 1 > > #define PETSC_HAVE_SYS_SYSINFO_H 1 > > #define PETSC_HAVE_SYS_RESOURCE_H 1 > > #define PETSC_HAVE_SYS_TIMES_H 1 > > #define PETSC_HAVE_NETDB_H 1 > > #define PETSC_HAVE_MALLOC_H 1 > > #define PETSC_HAVE_PWD_H 1 > > #define PETSC_HAVE_FCNTL_H 1 > > #define PETSC_HAVE_STRINGS_H 1 > > #define PETSC_HAVE_MEMORY_H 1 > > #define PETSC_TIME_WITH_SYS_TIME 1 > > #define PETSC_HAVE_SYS_TIME_H 1 > > #define PETSC_USING_F90 1 > > #define PETSC_HAVE_RTLD_NOW 1 > > #define PETSC_HAVE_RTLD_LOCAL 1 > > #define PETSC_HAVE_RTLD_LAZY 1 > > #define PETSC_C_STATIC_INLINE static inline > > #define PETSC_HAVE_FORTRAN_UNDERSCORE 1 > > #define PETSC_HAVE_CXX_NAMESPACE 1 > > #define PETSC_HAVE_RTLD_GLOBAL 1 > > #define PETSC_C_RESTRICT __restrict__ > > #define PETSC_CXX_RESTRICT __restrict__ > > #define PETSC_CXX_STATIC_INLINE static inline > > #define PETSC_HAVE_LIBCUBLAS 1 > > #define PETSC_HAVE_LIBCUDART 1 > > #define PETSC_HAVE_LIBDL 1 > > #define PETSC_HAVE_LIBFBLAS 1 > > #define PETSC_HAVE_LIBFLAPACK 1 > > #define PETSC_HAVE_ERF 1 > > #define PETSC_HAVE_LIBCUFFT 1 > > #define PETSC_HAVE_LIBRT 1 > > #define PETSC_ARCH "arch-linux-gnu-c-debug" > > #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100" > > #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6" > > #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e" > > #define PETSC_DIR "/home/kuba/External/petsc-dev" > > #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600" > > #define HAVE_GZIP 1 > > #define PETSC_CLANGUAGE_C 1 > > #define PETSC_USE_EXTERN_CXX > > #define PETSC_USE_ERRORCHECKING 1 > > #define PETSC_MISSING_DREAL 1 > > #define PETSC_SIZEOF_MPI_COMM 4 > > #define PETSC_BITS_PER_BYTE 8 > > #define PETSC_SIZEOF_MPI_FINT 4 > > #define PETSC_SIZEOF_VOID_P 4 > > #define PETSC_RETSIGTYPE void > > #define PETSC_HAVE_CXX_COMPLEX 1 > > #define PETSC_SIZEOF_LONG 4 > > #define PETSC_USE_FORTRANKIND 1 > > #define PETSC_SIZEOF_SIZE_T 4 > > #define PETSC_SIZEOF_CHAR 1 > > #define PETSC_SIZEOF_DOUBLE 8 > > #define PETSC_SIZEOF_FLOAT 4 > > #define PETSC_HAVE_C99_COMPLEX 1 > > #define PETSC_SIZEOF_INT 4 > > #define PETSC_SIZEOF_LONG_LONG 8 > > #define PETSC_SIZEOF_SHORT 2 > > #define PETSC_HAVE_STRCASECMP 1 > > #define PETSC_HAVE_POPEN 1 > > #define PETSC_HAVE_SIGSET 1 > > #define PETSC_HAVE_GETWD 1 > > #define PETSC_HAVE_VSNPRINTF 1 > > #define PETSC_HAVE_TIMES 1 > > #define PETSC_HAVE_DLSYM 1 > > #define PETSC_HAVE_SNPRINTF 1 > > #define PETSC_HAVE_GETPWUID 1 > > #define PETSC_HAVE_GETHOSTBYNAME 1 > > #define PETSC_HAVE_SLEEP 1 > > #define PETSC_HAVE_DLERROR 1 > > #define PETSC_HAVE_FORK 1 > > #define PETSC_HAVE_RAND 1 > > #define PETSC_HAVE_GETTIMEOFDAY 1 > > #define PETSC_HAVE_DLCLOSE 1 > > #define PETSC_HAVE_UNAME 1 > > #define PETSC_HAVE_GETHOSTNAME 1 > > #define PETSC_HAVE_MKSTEMP 1 > > #define PETSC_HAVE_SIGACTION 1 > > #define PETSC_HAVE_DRAND48 1 > > #define PETSC_HAVE_NANOSLEEP 1 > > #define PETSC_HAVE_VA_COPY 1 > > #define PETSC_HAVE_CLOCK 1 > > #define PETSC_HAVE_ACCESS 1 > > #define PETSC_HAVE_SIGNAL 1 > > #define PETSC_HAVE_USLEEP 1 > > #define PETSC_HAVE_GETRUSAGE 1 > > #define PETSC_HAVE_VFPRINTF 1 > > #define PETSC_HAVE_MEMALIGN 1 > > #define PETSC_HAVE_GETDOMAINNAME 1 > > #define PETSC_HAVE_TIME 1 > > #define PETSC_HAVE_LSEEK 1 > > #define PETSC_HAVE_SOCKET 1 > > #define PETSC_HAVE_SYSINFO 1 > > #define PETSC_HAVE_READLINK 1 > > #define PETSC_HAVE_REALPATH 1 > > #define PETSC_HAVE_DLOPEN 1 > > #define PETSC_HAVE_MEMMOVE 1 > > #define PETSC_HAVE__GFORTRAN_IARGC 1 > > #define PETSC_SIGNAL_CAST > > #define PETSC_HAVE_GETCWD 1 > > #define PETSC_HAVE_VPRINTF 1 > > #define PETSC_HAVE_BZERO 1 > > #define PETSC_HAVE_GETPAGESIZE 1 > > #define PETSC_LEVEL1_DCACHE_LINESIZE 64 > > #define PETSC_LEVEL1_DCACHE_SIZE 32768 > > #define PETSC_LEVEL1_DCACHE_ASSOC 8 > > #define PETSC_USE_PROC_FOR_SIZE 1 > > #define PETSC_HAVE_DYNAMIC_LIBRARIES 1 > > #define PETSC_HAVE_SHARED_LIBRARIES 1 > > #define PETSC_MEMALIGN 16 > > #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1 > > #define PETSC_HAVE_GFORTRAN_IARGC 1 > > #define PETSC_HAVE_ISINF 1 > > #define PETSC_HAVE_ISNAN 1 > > #define PETSC_HAVE_MPI_COMM_C2F 1 > > #define PETSC_HAVE_MPI_LONG_DOUBLE 1 > > #define PETSC_HAVE_MPI_COMM_F2C 1 > > #define PETSC_HAVE_MPI_FINT 1 > > #define PETSC_HAVE_MPI_F90MODULE 1 > > #define PETSC_HAVE_MPI_FINALIZED 1 > > #define PETSC_HAVE_MPI_COMM_SPAWN 1 > > #define PETSC_HAVE_MPI_WIN_CREATE 1 > > #define PETSC_HAVE_MPIIO 1 > > #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1 > > #define PETSC_HAVE_MPI_ALLTOALLW 1 > > #define PETSC_HAVE_MPI_IN_PLACE 1 > > #define PETSC_USE_INFO 1 > > #define PETSC_PETSC_USE_BACKWARD_LOOP 1 > > #define PETSC_Alignx(a,b) > > #define PETSC_USE_DEBUG 1 > > #define PETSC_USE_LOG 1 > > #define PETSC_IS_COLOR_VALUE_TYPE short > > #define PETSC_USE_CTABLE 1 > > #define PETSC_USE_GDB_DEBUGGER 1 > > #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" { > > #define PETSC_CUDA_EXTERN_C_END } > > #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1 > > #define PETSC_BLASLAPACK_UNDERSCORE 1 > > ----------------------------------------- > > Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include > > -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include > > -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ > > -I/usr/local/cuda/include/thrust/ > > Using C/C++ > > compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc > > -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 > > Using Fortran include/module paths: > > -I/home/kuba/External/petsc-dev/include > > -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include > > -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ > > -I/usr/local/cuda/include/thrust/ > > Using Fortran > > compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g > > ----------------------------------------- > > Using C/C++ > > linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc > > Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing > > -Wno-unknown-pragmas -g3 > > Using Fortran > > linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 > > Using Fortran flags: -Wall -Wno-unused-variable -g > > ----------------------------------------- > > Using libraries: > > -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib -lpetsc > > -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft > > -lcublas -lcudart > > -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib > > -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5 > > -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread > > -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx > > -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl > > ------------------------------------------ > > Using > > mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec > > ========================================== > > /bin/rm -f > > -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.* > > /bin/rm -f > > -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod > > BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES > > ========================================= > > libfast in: /home/kuba/External/petsc-dev/src > > libfast in: /home/kuba/External/petsc-dev/src/inline > > libfast in: /home/kuba/External/petsc-dev/src/sys > > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer > > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls > > libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket > > In file included from /usr/local/cuda/include/cusp/detail/config.h:24, > > from /usr/local/cuda/include/cusp/memory.h:20, > > > > from /home/kuba/External/petsc-dev/include/petscsys.h:1671, > > from send.c:3: > > /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?, > > ?;?, ?asm? or ?__attribute__? before ?thrust? > > In file included from /usr/local/cuda/include/cusp/memory.h:22, > > > > from /home/kuba/External/petsc-dev/include/petscsys.h:1671, > > from send.c:3: > > /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal > > error: iterator: No such file or directory > > compilation terminated. > > > > > > > > Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze: > > > On Thu, Dec 9, 2010 at 20:44, Jakub Pola wrote: > > > Petsc Release Version 3.1.0 > > > > > > You need petsc-dev for this. > > > > > > From jakub.pola at gmail.com Thu Dec 9 17:52:03 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Fri, 10 Dec 2010 00:52:03 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> Message-ID: <1291938723.2227.37.camel@desktop> Sorry I made mistake in writing It should be petsc-dev instead of 3.1.0 Dnia 2010-12-10, pi? o godzinie 00:38 +0100, Jed Brown pisze: > On Fri, Dec 10, 2010 at 00:07, Jakub Pola > wrote: > I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp > 0.2.0 > > As I said before, you need petsc-dev for this, CUDA support is not in > 3.1. > > > http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining From bsmith at mcs.anl.gov Thu Dec 9 18:08:36 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 9 Dec 2010 18:08:36 -0600 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: <1291938252.2227.33.camel@desktop> References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> <1291938252.2227.33.camel@desktop> Message-ID: On Dec 9, 2010, at 5:44 PM, Jakub Pola wrote: > When I used your suggestion I got following error. > > kuba at desktop:~/External/petsc-dev$ ./config/configure.py --with-cc=gcc > --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 > --with-cuda=1 --with-debug=no --with-cups=1 --with-thrust=1 --with-cusp NOT --with-cups > =============================================================================== > Configuring PETSc to compile on your > system > =============================================================================== > TESTING: checkInclude from > config.headers(config/BuildSystem/config/headers.py:82) > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for details): > ------------------------------------------------------------------------------- > PETSc CUDA support requires the CUSP and Thrust packages > Rerun configure using --with-cusp-dir and --with-thrust-dir > ******************************************************************************* > > > Dnia 2010-12-09, czw o godzinie 17:29 -0600, Satish Balay pisze: >> Try removing options: >> --with-cusp-dir=/usr/local/cuda/include/cusp/ --with-thrust-dir=/usr/local/cuda/include/thrust >> [and use --with-cups=1 --with-thrust=1] >> >> As they are in default location - configure picks them automatically. But >> since they are incorrectly specified - the 'default cuda path' gets >> the configure tests going - but this path causes grief later.. >> >>> In file included from /usr/local/cuda/include/cusp/detail/config.h:24, >>> from /usr/local/cuda/include/cusp/memory.h:20, >> >> >> Thats suspporsed to system memory.h - not cups/memory.h >> >> Satish >> >> On Fri, 10 Dec 2010, Jakub Pola wrote: >> >>> Hi again, >>> >>> I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp 0.2.0 >>> I have problems with compiling the library. I did following steps: >>> >>> configured pets: >>> >>> ./config/configure.py --with-cc=gcc --with-fc=gfortran >>> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 >>> --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ >>> --with-thrust-dir=/usr/local/cuda/include/thrust >>> >>> then make: >>> make PETSC_DIR=/home/kuba/External/petsc-dev >>> PETSC_ARCH=arch-linux-gnu-c-debug all >>> >>> here I have problems because compilator says that >>> /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal >>> error: iterator: No such file or directory >>> >>> But actually it is as a symbolic link: >>> ls -l /usr/local/cuda/include/ >>> gives >>> lrwxrwxrwx 1 root root 34 2010-12-09 23:44 thrust >>> -> /home/kuba/External/thrust/thrust/ >>> >>> and >>> >>> kuba at desktop:~/External/thrust/thrust/iterator$ ls -l >>> razem 96 >>> -rw-r--r-- 1 kuba kuba 7666 2010-12-09 21:30 constant_iterator.h >>> -rw-r--r-- 1 kuba kuba 6959 2010-12-09 21:30 counting_iterator.h >>> drwxr-xr-x 3 kuba kuba 4096 2010-12-09 21:30 detail >>> -rw-r--r-- 1 kuba kuba 4376 2010-12-09 21:30 iterator_adaptor.h >>> -rw-r--r-- 1 kuba kuba 8282 2010-12-09 21:30 iterator_categories.h >>> -rw-r--r-- 1 kuba kuba 14279 2010-12-09 21:30 iterator_facade.h >>> -rw-r--r-- 1 kuba kuba 2066 2010-12-09 21:30 iterator_traits.h >>> -rw-r--r-- 1 kuba kuba 6880 2010-12-09 21:30 permutation_iterator.h >>> -rw-r--r-- 1 kuba kuba 7055 2010-12-09 21:30 reverse_iterator.h >>> -rw-r--r-- 1 kuba kuba 10089 2010-12-09 21:30 transform_iterator.h >>> -rw-r--r-- 1 kuba kuba 7348 2010-12-09 21:30 zip_iterator.h >>> kuba at desktop:~/External/thrust/thrust/iterator$ >>> >>> >>> Have somebody faced this kind of problem? >>> >>> >>> >>> >>> Here it is compilation log to first error >>> >>> kuba at desktop:~/External/petsc-dev$ make >>> PETSC_DIR=/home/kuba/External/petsc-dev >>> PETSC_ARCH=arch-linux-gnu-c-debug all >>> ========================================== >>> >>> See documentation/faq.html and documentation/bugreporting.html >>> for help with installation problems. Please send EVERYTHING >>> printed out below when reporting problems >>> >>> To subscribe to the PETSc announcement list, send mail to >>> majordomo at mcs.anl.gov with the message: >>> subscribe petsc-announce >>> >>> To subscribe to the PETSc users mailing list, send mail to >>> majordomo at mcs.anl.gov with the message: >>> subscribe petsc-users >>> >>> ========================================== >>> On czw, 9 gru 2010, 23:56:38 CET on desktop >>> Machine characteristics: Linux desktop 2.6.35-22-generic #35-Ubuntu SMP >>> Sat Oct 16 20:36:48 UTC 2010 i686 GNU/Linux >>> ----------------------------------------- >>> Using PETSc directory: /home/kuba/External/petsc-dev >>> Using PETSc arch: arch-linux-gnu-c-debug >>> ----------------------------------------- >>> PETSC_VERSION_RELEASE 0 >>> PETSC_VERSION_MAJOR 3 >>> PETSC_VERSION_MINOR 1 >>> PETSC_VERSION_SUBMINOR 0 >>> PETSC_VERSION_PATCH 6 >>> PETSC_VERSION_DATE "Mar, 25, 2010" >>> PETSC_VERSION_PATCH_DATE "unknown" >>> PETSC_VERSION_HG "unknown" >>> PETSC_VERSION_DATE_HG "unknown" >>> PETSC_VERSION_(MAJOR,MINOR,SUBMINOR) \ >>> ----------------------------------------- >>> Using configure Options: --with-cc=gcc --with-fc=gfortran >>> --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 >>> --with-debug=no --with-cusp-dir=/usr/local/cuda/include/cusp/ >>> --with-thrust-dir=/usr/local/cuda/include/thrust >>> Using configuration flags: >>> #define INCLUDED_PETSCCONF_H >>> #define IS_COLORING_MAX 65535 >>> #define STDC_HEADERS 1 >>> #define MPIU_COLORING_VALUE MPI_UNSIGNED_SHORT >>> #define PETSC_UINTPTR_T uintptr_t >>> #define PETSC_HAVE_PTHREAD 1 >>> #define PETSC_STATIC_INLINE static inline >>> #define PETSC_REPLACE_DIR_SEPARATOR '\\' >>> #define PETSC_RESTRICT __restrict__ >>> #define PETSC_HAVE_MPI 1 >>> #define PETSC_USE_SINGLE_LIBRARY 1 >>> #define PETSC_USE_SOCKET_VIEWER 1 >>> #define PETSC_HAVE_THRUST 1 >>> #define PETSC_LIB_DIR >>> "/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib" >>> #define PETSC_HAVE_FORTRAN 1 >>> #define PETSC_HAVE_SOWING 1 >>> #define PETSC_SLSUFFIX "" >>> #define PETSC_FUNCTION_NAME_CXX __func__ >>> #define PETSC_HAVE_DOUBLE_ALIGN_MALLOC 1 >>> #define PETSC_UNUSED >>> #define PETSC_HAVE_CUDA 1 >>> #define PETSC_FUNCTION_NAME_C __func__ >>> #define PETSC_HAVE_C2HTML 1 >>> #define PETSC_HAVE_VALGRIND 1 >>> #define PETSC_HAVE_BUILTIN_EXPECT 1 >>> #define PETSC_DIR_SEPARATOR '/' >>> #define PETSC_PATH_SEPARATOR ':' >>> #define PETSC_HAVE_X11 1 >>> #define PETSC_HAVE_CUSP 1 >>> #define PETSC_Prefetch(a,b,c) >>> #define PETSC_HAVE_BLASLAPACK 1 >>> #define PETSC_HAVE_STRING_H 1 >>> #define PETSC_HAVE_SYS_TYPES_H 1 >>> #define PETSC_HAVE_ENDIAN_H 1 >>> #define PETSC_HAVE_SYS_PROCFS_H 1 >>> #define PETSC_HAVE_DLFCN_H 1 >>> #define PETSC_HAVE_STDINT_H 1 >>> #define PETSC_HAVE_LINUX_KERNEL_H 1 >>> #define PETSC_HAVE_TIME_H 1 >>> #define PETSC_HAVE_MATH_H 1 >>> #define PETSC_HAVE_STDLIB_H 1 >>> #define PETSC_HAVE_SYS_PARAM_H 1 >>> #define PETSC_HAVE_SYS_SOCKET_H 1 >>> #define PETSC_HAVE_UNISTD_H 1 >>> #define PETSC_HAVE_SYS_WAIT_H 1 >>> #define PETSC_HAVE_LIMITS_H 1 >>> #define PETSC_HAVE_SYS_UTSNAME_H 1 >>> #define PETSC_HAVE_NETINET_IN_H 1 >>> #define PETSC_HAVE_FENV_H 1 >>> #define PETSC_HAVE_FLOAT_H 1 >>> #define PETSC_HAVE_SEARCH_H 1 >>> #define PETSC_HAVE_SYS_SYSINFO_H 1 >>> #define PETSC_HAVE_SYS_RESOURCE_H 1 >>> #define PETSC_HAVE_SYS_TIMES_H 1 >>> #define PETSC_HAVE_NETDB_H 1 >>> #define PETSC_HAVE_MALLOC_H 1 >>> #define PETSC_HAVE_PWD_H 1 >>> #define PETSC_HAVE_FCNTL_H 1 >>> #define PETSC_HAVE_STRINGS_H 1 >>> #define PETSC_HAVE_MEMORY_H 1 >>> #define PETSC_TIME_WITH_SYS_TIME 1 >>> #define PETSC_HAVE_SYS_TIME_H 1 >>> #define PETSC_USING_F90 1 >>> #define PETSC_HAVE_RTLD_NOW 1 >>> #define PETSC_HAVE_RTLD_LOCAL 1 >>> #define PETSC_HAVE_RTLD_LAZY 1 >>> #define PETSC_C_STATIC_INLINE static inline >>> #define PETSC_HAVE_FORTRAN_UNDERSCORE 1 >>> #define PETSC_HAVE_CXX_NAMESPACE 1 >>> #define PETSC_HAVE_RTLD_GLOBAL 1 >>> #define PETSC_C_RESTRICT __restrict__ >>> #define PETSC_CXX_RESTRICT __restrict__ >>> #define PETSC_CXX_STATIC_INLINE static inline >>> #define PETSC_HAVE_LIBCUBLAS 1 >>> #define PETSC_HAVE_LIBCUDART 1 >>> #define PETSC_HAVE_LIBDL 1 >>> #define PETSC_HAVE_LIBFBLAS 1 >>> #define PETSC_HAVE_LIBFLAPACK 1 >>> #define PETSC_HAVE_ERF 1 >>> #define PETSC_HAVE_LIBCUFFT 1 >>> #define PETSC_HAVE_LIBRT 1 >>> #define PETSC_ARCH "arch-linux-gnu-c-debug" >>> #define PETSC_VERSION_DATE_HG "Thu Dec 09 20:23:16 2010 +0100" >>> #define PETSC_VERSION_BS_HG "47bec558f992b1828a074066eb6df9f5b106a6b6" >>> #define PETSC_VERSION_HG "488e1fcaa13db132861c12416293551e6e00b14e" >>> #define PETSC_DIR "/home/kuba/External/petsc-dev" >>> #define PETSC_VERSION_BS_DATE_HG "Tue Dec 07 14:41:13 2010 -0600" >>> #define HAVE_GZIP 1 >>> #define PETSC_CLANGUAGE_C 1 >>> #define PETSC_USE_EXTERN_CXX >>> #define PETSC_USE_ERRORCHECKING 1 >>> #define PETSC_MISSING_DREAL 1 >>> #define PETSC_SIZEOF_MPI_COMM 4 >>> #define PETSC_BITS_PER_BYTE 8 >>> #define PETSC_SIZEOF_MPI_FINT 4 >>> #define PETSC_SIZEOF_VOID_P 4 >>> #define PETSC_RETSIGTYPE void >>> #define PETSC_HAVE_CXX_COMPLEX 1 >>> #define PETSC_SIZEOF_LONG 4 >>> #define PETSC_USE_FORTRANKIND 1 >>> #define PETSC_SIZEOF_SIZE_T 4 >>> #define PETSC_SIZEOF_CHAR 1 >>> #define PETSC_SIZEOF_DOUBLE 8 >>> #define PETSC_SIZEOF_FLOAT 4 >>> #define PETSC_HAVE_C99_COMPLEX 1 >>> #define PETSC_SIZEOF_INT 4 >>> #define PETSC_SIZEOF_LONG_LONG 8 >>> #define PETSC_SIZEOF_SHORT 2 >>> #define PETSC_HAVE_STRCASECMP 1 >>> #define PETSC_HAVE_POPEN 1 >>> #define PETSC_HAVE_SIGSET 1 >>> #define PETSC_HAVE_GETWD 1 >>> #define PETSC_HAVE_VSNPRINTF 1 >>> #define PETSC_HAVE_TIMES 1 >>> #define PETSC_HAVE_DLSYM 1 >>> #define PETSC_HAVE_SNPRINTF 1 >>> #define PETSC_HAVE_GETPWUID 1 >>> #define PETSC_HAVE_GETHOSTBYNAME 1 >>> #define PETSC_HAVE_SLEEP 1 >>> #define PETSC_HAVE_DLERROR 1 >>> #define PETSC_HAVE_FORK 1 >>> #define PETSC_HAVE_RAND 1 >>> #define PETSC_HAVE_GETTIMEOFDAY 1 >>> #define PETSC_HAVE_DLCLOSE 1 >>> #define PETSC_HAVE_UNAME 1 >>> #define PETSC_HAVE_GETHOSTNAME 1 >>> #define PETSC_HAVE_MKSTEMP 1 >>> #define PETSC_HAVE_SIGACTION 1 >>> #define PETSC_HAVE_DRAND48 1 >>> #define PETSC_HAVE_NANOSLEEP 1 >>> #define PETSC_HAVE_VA_COPY 1 >>> #define PETSC_HAVE_CLOCK 1 >>> #define PETSC_HAVE_ACCESS 1 >>> #define PETSC_HAVE_SIGNAL 1 >>> #define PETSC_HAVE_USLEEP 1 >>> #define PETSC_HAVE_GETRUSAGE 1 >>> #define PETSC_HAVE_VFPRINTF 1 >>> #define PETSC_HAVE_MEMALIGN 1 >>> #define PETSC_HAVE_GETDOMAINNAME 1 >>> #define PETSC_HAVE_TIME 1 >>> #define PETSC_HAVE_LSEEK 1 >>> #define PETSC_HAVE_SOCKET 1 >>> #define PETSC_HAVE_SYSINFO 1 >>> #define PETSC_HAVE_READLINK 1 >>> #define PETSC_HAVE_REALPATH 1 >>> #define PETSC_HAVE_DLOPEN 1 >>> #define PETSC_HAVE_MEMMOVE 1 >>> #define PETSC_HAVE__GFORTRAN_IARGC 1 >>> #define PETSC_SIGNAL_CAST >>> #define PETSC_HAVE_GETCWD 1 >>> #define PETSC_HAVE_VPRINTF 1 >>> #define PETSC_HAVE_BZERO 1 >>> #define PETSC_HAVE_GETPAGESIZE 1 >>> #define PETSC_LEVEL1_DCACHE_LINESIZE 64 >>> #define PETSC_LEVEL1_DCACHE_SIZE 32768 >>> #define PETSC_LEVEL1_DCACHE_ASSOC 8 >>> #define PETSC_USE_PROC_FOR_SIZE 1 >>> #define PETSC_HAVE_DYNAMIC_LIBRARIES 1 >>> #define PETSC_HAVE_SHARED_LIBRARIES 1 >>> #define PETSC_MEMALIGN 16 >>> #define PETSC_HAVE_FORTRAN_GET_COMMAND_ARGUMENT 1 >>> #define PETSC_HAVE_GFORTRAN_IARGC 1 >>> #define PETSC_HAVE_ISINF 1 >>> #define PETSC_HAVE_ISNAN 1 >>> #define PETSC_HAVE_MPI_COMM_C2F 1 >>> #define PETSC_HAVE_MPI_LONG_DOUBLE 1 >>> #define PETSC_HAVE_MPI_COMM_F2C 1 >>> #define PETSC_HAVE_MPI_FINT 1 >>> #define PETSC_HAVE_MPI_F90MODULE 1 >>> #define PETSC_HAVE_MPI_FINALIZED 1 >>> #define PETSC_HAVE_MPI_COMM_SPAWN 1 >>> #define PETSC_HAVE_MPI_WIN_CREATE 1 >>> #define PETSC_HAVE_MPIIO 1 >>> #define PETSC_HAVE_MPI_C_DOUBLE_COMPLEX 1 >>> #define PETSC_HAVE_MPI_ALLTOALLW 1 >>> #define PETSC_HAVE_MPI_IN_PLACE 1 >>> #define PETSC_USE_INFO 1 >>> #define PETSC_PETSC_USE_BACKWARD_LOOP 1 >>> #define PETSC_Alignx(a,b) >>> #define PETSC_USE_DEBUG 1 >>> #define PETSC_USE_LOG 1 >>> #define PETSC_IS_COLOR_VALUE_TYPE short >>> #define PETSC_USE_CTABLE 1 >>> #define PETSC_USE_GDB_DEBUGGER 1 >>> #define PETSC_CUDA_EXTERN_C_BEGIN extern "C" { >>> #define PETSC_CUDA_EXTERN_C_END } >>> #define PETSC_HAVE_CUSP_SMOOTHED_AGGREGATION 1 >>> #define PETSC_BLASLAPACK_UNDERSCORE 1 >>> ----------------------------------------- >>> Using C/C++ include paths: -I/home/kuba/External/petsc-dev/include >>> -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include >>> -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ >>> -I/usr/local/cuda/include/thrust/ >>> Using C/C++ >>> compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc >>> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 >>> Using Fortran include/module paths: >>> -I/home/kuba/External/petsc-dev/include >>> -I/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include >>> -I/usr/local/cuda/include -I/usr/local/cuda/include/cusp/ >>> -I/usr/local/cuda/include/thrust/ >>> Using Fortran >>> compiler: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g >>> ----------------------------------------- >>> Using C/C++ >>> linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpicc >>> Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing >>> -Wno-unknown-pragmas -g3 >>> Using Fortran >>> linker: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpif90 >>> Using Fortran flags: -Wall -Wno-unused-variable -g >>> ----------------------------------------- >>> Using libraries: >>> -L/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib -lpetsc >>> -lX11 -Wl,-rpath,/usr/local/cuda/lib -L/usr/local/cuda/lib -lcufft >>> -lcublas -lcudart >>> -Wl,-rpath,/home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib >>> -lflapack -lfblas -L/usr/lib/gcc/i686-linux-gnu/4.4.5 >>> -L/usr/lib/i686-linux-gnu -ldl -lmpich -lopa -lmpl -lrt -lpthread >>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx >>> -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl >>> ------------------------------------------ >>> Using >>> mpiexec: /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/bin/mpiexec >>> ========================================== >>> /bin/rm -f >>> -rf /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib/libpetsc*.* >>> /bin/rm -f >>> -f /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/include/petsc*.mod >>> BEGINNING TO COMPILE LIBRARIES IN ALL DIRECTORIES >>> ========================================= >>> libfast in: /home/kuba/External/petsc-dev/src >>> libfast in: /home/kuba/External/petsc-dev/src/inline >>> libfast in: /home/kuba/External/petsc-dev/src/sys >>> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer >>> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls >>> libfast in: /home/kuba/External/petsc-dev/src/sys/viewer/impls/socket >>> In file included from /usr/local/cuda/include/cusp/detail/config.h:24, >>> from /usr/local/cuda/include/cusp/memory.h:20, >>> >>> from /home/kuba/External/petsc-dev/include/petscsys.h:1671, >>> from send.c:3: >>> /usr/local/cuda/include/thrust/version.h:69: error: expected ?=?, ?,?, >>> ?;?, ?asm? or ?__attribute__? before ?thrust? >>> In file included from /usr/local/cuda/include/cusp/memory.h:22, >>> >>> from /home/kuba/External/petsc-dev/include/petscsys.h:1671, >>> from send.c:3: >>> /usr/local/cuda/include/thrust/iterator/iterator_traits.h:34: fatal >>> error: iterator: No such file or directory >>> compilation terminated. >>> >>> >>> >>> Dnia 2010-12-09, czw o godzinie 20:47 +0100, Jed Brown pisze: >>>> On Thu, Dec 9, 2010 at 20:44, Jakub Pola wrote: >>>> Petsc Release Version 3.1.0 >>>> >>>> You need petsc-dev for this. >>> >>> >>> > > From Pierre.Moinier at baesystems.com Fri Dec 10 04:36:10 2010 From: Pierre.Moinier at baesystems.com (Moinier, Pierre (UK)) Date: Fri, 10 Dec 2010 10:36:10 -0000 Subject: [petsc-users] GPU-enabled PETSc In-Reply-To: <1291938723.2227.37.camel@desktop> References: <1291923862.2227.14.camel@desktop><1291936065.2227.31.camel@desktop> <1291938723.2227.37.camel@desktop> Message-ID: <32845768EC63B04EB132BC2C4351B226C6B397@GLKMS2114.GREENLNK.NET> Hi, Can any one tell me where to find a documentation that gives details on the GPU-enabled PETSc version? I looked in the archive mailing list as well as the petsc-dev doc folder, but did not find anything. What I am looking for are how the implementation is done, what is currently done and some examples... Regards, -Pierre. ******************************************************************** This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ******************************************************************** From jakub.pola at gmail.com Fri Dec 10 05:33:37 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Fri, 10 Dec 2010 12:33:37 +0100 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> Message-ID: <1291980817.2227.46.camel@desktop> After typing --with-cusp=1 and --with-thrust=1 library was compiled successfully but when I want to make tests i got following error: What could be the reason of that? I have GTX 480 with drivers ver. 260 Ubuntu 10.10 32bit system with 2GB ram. Core duo processor 2.2GHz. Test.log: Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188 src/vec/vec/impls/seq/seqcuda/veccuda.cu [0]PETSC ERROR: [0] VecGetArray line 226 src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h [0]PETSC ERROR: [0] VecCreateGhostWithArray line 581 src/vec/vec/impls/mpi/pbvec.c [0]PETSC ERROR: [0] VecCreateGhost line 661 src/vec/vec/impls/mpi/pbvec.c [0]PETSC ERROR: [0] MatFDColoringCreate_SeqAIJ line 21 src/mat/impls/aij/seq/fdaij.c [0]PETSC ERROR: [0] MatFDColoringCreate line 376 src/mat/matfd/fdmatrix.c [0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c [0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933 src/snes/utils/damgsnes.c [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Signal received! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Development HG revision: 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 2010 +0100 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10 08:36:02 2010 [0]PETSC ERROR: Libraries linked from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cusp=1 --with-thrust=1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html [0]PETSC ERROR: [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrindor see http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [1]PETSC ERROR: likely location of problem given in stack below [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188 src/vec/vec/impls/seq/seqcuda/veccuda.cu [0]PETSC ERROR: [0] VecGetArray line 226 src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h [0]PETSC ERROR: [0] VecCreateGhostWithArray line 581 src/vec/vec/impls/mpi/pbvec.c [0]PETSC ERROR: [0] VecCreateGhost line 661 src/vec/vec/impls/mpi/pbvec.c [0]PETSC ERROR: [0] MatFDColoringCreate_MPIAIJ line 24 src/mat/impls/aij/mpi/fdmpiaij.c [0]PETSC ERROR: [0] MatFDColoringCreate line 376 src/mat/matfd/fdmatrix.c [0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c [0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933 src/snes/utils/damgsnes.c [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Signal received! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Development HG revision: 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 2010 +0100 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10 08:36:03 2010 [0]PETSC ERROR: Libraries linked from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cusp=1 --with-thrust=1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file [1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 Note: The EXACT line numbers in the stack are not available, [1]PETSC ERROR: INSTEAD the line number of the start of the function [1]PETSC ERROR: is given. [1]PETSC ERROR: [1] VecCUDACopyFromGPU line 188 src/vec/vec/impls/seq/seqcuda/veccuda.cu [1]PETSC ERROR: [1] VecGetArray line 226 src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h [1]PETSC ERROR: [1] VecCreateGhostWithArray line 581 src/vec/vec/impls/mpi/pbvec.c [1]PETSC ERROR: [1] VecCreateGhost line 661 src/vec/vec/impls/mpi/pbvec.c [1]PETSC ERROR: [1] MatFDColoringCreate_MPIAIJ line 24 src/mat/impls/aij/mpi/fdmpiaij.c [1]PETSC ERROR: [1] MatFDColoringCreate line 376 src/mat/matfd/fdmatrix.c [1]PETSC ERROR: [1] DMMGSetSNES line 562 src/snes/utils/damgsnes.c [1]PETSC ERROR: [1] DMMGSetSNESLocal_Private line 933 src/snes/utils/damgsnes.c [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Signal received! [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Petsc Development HG revision: 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 2010 +0100 [1]PETSC ERROR: See docs/changes/index.html for recent updates. [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [1]PETSC ERROR: See docs/index.html for manual pages. [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10 08:36:03 2010 [1]PETSC ERROR: Libraries linked from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib [1]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 [1]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cusp=1 --with-thrust=1 [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: User provided function() line 0 in unknown directory unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1 [cli_1]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) Error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188 src/vec/vec/impls/seq/seqcuda/veccuda.cu [0]PETSC ERROR: [0] VecGetArray line 226 src/vec/vec/interface/ftn-custom//home/kuba/External/petsc-dev/include/private/vecimpl.h [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Signal received! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Development HG revision: 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 2010 +0100 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./ex5f on a arch-linu named desktop by kuba Fri Dec 10 08:36:09 2010 [0]PETSC ERROR: Libraries linked from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 --with-debug=no --with-cusp=1 --with-thrust=1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) Completed test examples Dnia 2010-12-10, pi? o godzinie 00:38 +0100, Jed Brown pisze: > On Fri, Dec 10, 2010 at 00:07, Jakub Pola > wrote: > I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp > 0.2.0 > > As I said before, you need petsc-dev for this, CUDA support is not in > 3.1. > > > http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining From bsmith at mcs.anl.gov Fri Dec 10 08:11:29 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 10 Dec 2010 08:11:29 -0600 Subject: [petsc-users] pets-3.1-p6 with CUDA: Unknown vector type: cuda! In-Reply-To: <1291980817.2227.46.camel@desktop> References: <1291923862.2227.14.camel@desktop> <1291936065.2227.31.camel@desktop> <1291980817.2227.46.camel@desktop> Message-ID: <5E812C03-40D1-4FD5-A43C-AD405B3746E8@mcs.anl.gov> Just discovered this problem ourselves with 32 bit compiles. You likely need to pass into ./configure the flags --with-cc="gcc -malign-double" --cxx="g++ -malign-double". Nvcc secretly uses that option of its compiles so we need to use it for all the compilers. Barry On Dec 10, 2010, at 5:33 AM, Jakub Pola wrote: > After typing --with-cusp=1 and --with-thrust=1 library was compiled > successfully but when I want to make tests i got following error: > What could be the reason of that? > > I have GTX 480 with drivers ver. 260 > Ubuntu 10.10 32bit system with 2GB ram. > Core duo processor 2.2GHz. > > > Test.log: > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > process > See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188 > src/vec/vec/impls/seq/seqcuda/veccuda.cu > [0]PETSC ERROR: [0] VecGetArray line 226 > src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h > [0]PETSC ERROR: [0] VecCreateGhostWithArray line 581 > src/vec/vec/impls/mpi/pbvec.c > [0]PETSC ERROR: [0] VecCreateGhost line 661 > src/vec/vec/impls/mpi/pbvec.c > [0]PETSC ERROR: [0] MatFDColoringCreate_SeqAIJ line 21 > src/mat/impls/aij/seq/fdaij.c > [0]PETSC ERROR: [0] MatFDColoringCreate line 376 > src/mat/matfd/fdmatrix.c > [0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c > [0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933 > src/snes/utils/damgsnes.c > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Signal received! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Development HG revision: > 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 > 2010 +0100 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10 > 08:36:02 2010 > [0]PETSC ERROR: Libraries linked > from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib > [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > --with-debug=no --with-cusp=1 --with-thrust=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: User provided function() line 0 in unknown directory > unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > processes > See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html > [0]PETSC ERROR: [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [1]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrindor > see > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [1]PETSC ERROR: likely location of problem given in stack below > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188 > src/vec/vec/impls/seq/seqcuda/veccuda.cu > [0]PETSC ERROR: [0] VecGetArray line 226 > src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h > [0]PETSC ERROR: [0] VecCreateGhostWithArray line 581 > src/vec/vec/impls/mpi/pbvec.c > [0]PETSC ERROR: [0] VecCreateGhost line 661 > src/vec/vec/impls/mpi/pbvec.c > [0]PETSC ERROR: [0] MatFDColoringCreate_MPIAIJ line 24 > src/mat/impls/aij/mpi/fdmpiaij.c > [0]PETSC ERROR: [0] MatFDColoringCreate line 376 > src/mat/matfd/fdmatrix.c > [0]PETSC ERROR: [0] DMMGSetSNES line 562 src/snes/utils/damgsnes.c > [0]PETSC ERROR: [0] DMMGSetSNESLocal_Private line 933 > src/snes/utils/damgsnes.c > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Signal received! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Development HG revision: > 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 > 2010 +0100 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10 > 08:36:03 2010 > [0]PETSC ERROR: Libraries linked > from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib > [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > --with-debug=no --with-cusp=1 --with-thrust=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: User provided function() line 0 in unknown directory > unknown file > [1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - > process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > Note: The EXACT line numbers in the stack are not available, > [1]PETSC ERROR: INSTEAD the line number of the start of the > function > [1]PETSC ERROR: is given. > [1]PETSC ERROR: [1] VecCUDACopyFromGPU line 188 > src/vec/vec/impls/seq/seqcuda/veccuda.cu > [1]PETSC ERROR: [1] VecGetArray line 226 > src/vec/vec/impls/mpi//home/kuba/External/petsc-dev/include/private/vecimpl.h > [1]PETSC ERROR: [1] VecCreateGhostWithArray line 581 > src/vec/vec/impls/mpi/pbvec.c > [1]PETSC ERROR: [1] VecCreateGhost line 661 > src/vec/vec/impls/mpi/pbvec.c > [1]PETSC ERROR: [1] MatFDColoringCreate_MPIAIJ line 24 > src/mat/impls/aij/mpi/fdmpiaij.c > [1]PETSC ERROR: [1] MatFDColoringCreate line 376 > src/mat/matfd/fdmatrix.c > [1]PETSC ERROR: [1] DMMGSetSNES line 562 src/snes/utils/damgsnes.c > [1]PETSC ERROR: [1] DMMGSetSNESLocal_Private line 933 > src/snes/utils/damgsnes.c > [1]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [1]PETSC ERROR: Signal received! > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Petsc Development HG revision: > 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 > 2010 +0100 > [1]PETSC ERROR: See docs/changes/index.html for recent updates. > [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [1]PETSC ERROR: See docs/index.html for manual pages. > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: ./ex19 on a arch-linu named desktop by kuba Fri Dec 10 > 08:36:03 2010 > [1]PETSC ERROR: Libraries linked > from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib > [1]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 > [1]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > --with-debug=no --with-cusp=1 --with-thrust=1 > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: User provided function() line 0 in unknown directory > unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1 > [cli_1]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1 > APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) > Error running Fortran example src/snes/examples/tutorials/ex5f with 1 > MPI process > See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] VecCUDACopyFromGPU line 188 > src/vec/vec/impls/seq/seqcuda/veccuda.cu > [0]PETSC ERROR: [0] VecGetArray line 226 > src/vec/vec/interface/ftn-custom//home/kuba/External/petsc-dev/include/private/vecimpl.h > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Signal received! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Development HG revision: > 488e1fcaa13db132861c12416293551e6e00b14e HG Date: Thu Dec 09 20:23:16 > 2010 +0100 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./ex5f on a arch-linu named desktop by kuba Fri Dec 10 > 08:36:09 2010 > [0]PETSC ERROR: Libraries linked > from /home/kuba/External/petsc-dev/arch-linux-gnu-c-debug/lib > [0]PETSC ERROR: Configure run at Fri Dec 10 08:05:40 2010 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran > --download-f-blas-lapack=1 --download-mpich=1 --with-cuda=1 > --with-debug=no --with-cusp=1 --with-thrust=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: User provided function() line 0 in unknown directory > unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) > Completed test examples > > Dnia 2010-12-10, pi? o godzinie 00:38 +0100, Jed Brown pisze: >> On Fri, Dec 10, 2010 at 00:07, Jakub Pola >> wrote: >> I downloaded Petsc 3.1.0 as well as thrust 1.4.0 and cusp >> 0.2.0 >> >> As I said before, you need petsc-dev for this, CUDA support is not in >> 3.1. >> >> >> http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining > > From bsmith at mcs.anl.gov Fri Dec 10 08:12:42 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 10 Dec 2010 08:12:42 -0600 Subject: [petsc-users] GPU-enabled PETSc In-Reply-To: <32845768EC63B04EB132BC2C4351B226C6B397@GLKMS2114.GREENLNK.NET> References: <1291923862.2227.14.camel@desktop><1291936065.2227.31.camel@desktop> <1291938723.2227.37.camel@desktop> <32845768EC63B04EB132BC2C4351B226C6B397@GLKMS2114.GREENLNK.NET> Message-ID: http://www.mcs.anl.gov/petsc/petsc-as/features/gpus.html since this is only supported in petsc-dev suggest moving any future discussion of these issues to petsc-dev at mcs.anl.gov Barry On Dec 10, 2010, at 4:36 AM, Moinier, Pierre (UK) wrote: > Hi, > > Can any one tell me where to find a documentation that gives details on > the GPU-enabled PETSc version? I looked in the archive mailing list as > well as the petsc-dev doc folder, but did not find anything. What I am > looking for are how the implementation is done, what is currently done > and some examples... > > Regards, > > -Pierre. > > > ******************************************************************** > This email and any attachments are confidential to the intended > recipient and may also be privileged. If you are not the intended > recipient please delete it from your system and notify the sender. > You should not copy it or use it for any purpose nor disclose or > distribute its contents to any other person. > ******************************************************************** > From jakub.pola at gmail.com Fri Dec 10 11:02:10 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Fri, 10 Dec 2010 18:02:10 +0100 Subject: [petsc-users] KSPBICG on GPU Message-ID: <1292000530.21002.5.camel@desktop> Hi Does anyone have a benchmark of KSPBICG solver which gives me an information about loop time, GFLOPS, memory transfer bandwidth ? Thanks in advance From filippo.spiga at disco.unimib.it Fri Dec 10 12:22:33 2010 From: filippo.spiga at disco.unimib.it (Filippo Spiga) Date: Fri, 10 Dec 2010 13:22:33 -0500 Subject: [petsc-users] About the "PetscOptionsSetValue" usage Message-ID: (sorry in advance for the cross-posting) Dear all, I recently decided to use in my program the routine "PetscOptionsSetValue" to allow to change at runtime the parameter of KSP/SNES. Of course, my code performs several operations on different KSP and SNES objects. Every object should have its own set of options (different tollerancies, different preconditioner) and I want to tune it at runtime. So "PetscOptionsSetValue" it is great. But I have a question: when I set an option, this option is still valid for all the program. Let's assume this situation (it is an example, I use fake options): PetscOptionsSetValue("A", "1"); PetscOptionsSetValue("B", "2"); PetscOptionsSetValue("C", "3"); ierr = KSPSolve(); PetscOptionsSetValue("A", "10"); PetscOptionsSetValue("B", "20"); ierr = SNESSolve(); This is what happens (if I correcly understood): - KSP runs with this presets {A=1, B=2, C=3} that are different than the defaults. - SNES runs with this presets {A=10, B=20, C=3}. So SNES runs with C=3 that is different from the default. But I would like to use the default because C=3 produces wrong errors. How I can easily reset the options' database? Thanks in advance, Cheers -- Filippo SPIGA, MSc Computer Science ?Nobody will drive us out of Cantor's paradise.? -- David Hilbert ***** Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission." -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 12:46:36 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 10 Dec 2010 19:46:36 +0100 Subject: [petsc-users] About the "PetscOptionsSetValue" usage In-Reply-To: References: Message-ID: On Fri, Dec 10, 2010 at 19:22, Filippo Spiga wrote: > So SNES runs with C=3 that is different from the default. But I would like > to use the default because C=3 produces wrong errors. How I can easily reset > the options' database? You're making this much more complicated than necessary. Call KSPSetOptionsPrefix(ksp,"a_"); SNESSetOptionsPrefix(snes,"b_"); KSPSetFromOptions(ksp); SNESSetFromOptions(snes); KSPSolve(ksp,...); SNESSolve(snes,...); Then you run the program with -a_ksp_type gmres -a_pc_type asm -b_snes_monitor -b_ksp_type ibcgs -b_pc_type bjacobi or whatever. You can control all the details of each solver independently. If you *need* to control the solvers in code (usually only if you have an adaptive method where *your program* takes *active* control of the solution process), you should pull out the objects and use the API instead of strings to get what you need: SNESGetKSP(snes,&inner_ksp); KSPSetType(inner_ksp,KSPIBCGS); -------------- next part -------------- An HTML attachment was scrubbed... URL: From filippo.spiga at disco.unimib.it Fri Dec 10 12:54:51 2010 From: filippo.spiga at disco.unimib.it (Filippo Spiga) Date: Fri, 10 Dec 2010 13:54:51 -0500 Subject: [petsc-users] About the "PetscOptionsSetValue" usage In-Reply-To: References: Message-ID: Interesting. This KSPSetOptionsPrefix(ksp,"a_"); PetscOptionsSetValue("a_A", "1"); PetscOptionsSetValue("a_B", "2"); PetscOptionsSetValue("a_C", "3"); ierr = KSPSolve(); SNESSetOptionsPrefix(snes,"b_"); PetscOptionsSetValue("b_A", "10"); PetscOptionsSetValue("b_B", "20"); ierr = SNESSolve(); should work because the preset will be - KSP : {A=1, B=2, C=3} - SNES : {A=10, B=20, C=PETSC_DEFAULT}. exactly as I want. I know that it is possible to use API to set parameters (KSPSetType, SNESSetType). But a lot of options of HYPRE or SUPERLU for example have no API. Instead of mix option from command-line and API I would like to put everything in a config file, read it and use PetscOptionsSetValue in the right way. It seems reasonable to me. Anyway, every suggestion is welcome (-: Thanks a lot! -- Filippo SPIGA, MSc Computer Science ~ homepage: http://tinyurl.com/fspiga ~ ?Nobody will drive us out of Cantor's paradise.? -- David Hilbert ***** Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission." On Fri, Dec 10, 2010 at 1:46 PM, Jed Brown wrote: > On Fri, Dec 10, 2010 at 19:22, Filippo Spiga < > filippo.spiga at disco.unimib.it> wrote: > >> So SNES runs with C=3 that is different from the default. But I would like >> to use the default because C=3 produces wrong errors. How I can easily reset >> the options' database? > > > You're making this much more complicated than necessary. Call > > KSPSetOptionsPrefix(ksp,"a_"); > SNESSetOptionsPrefix(snes,"b_"); > > KSPSetFromOptions(ksp); > SNESSetFromOptions(snes); > > KSPSolve(ksp,...); > SNESSolve(snes,...); > > > Then you run the program with > > -a_ksp_type gmres -a_pc_type asm -b_snes_monitor -b_ksp_type ibcgs > -b_pc_type bjacobi > > or whatever. You can control all the details of each solver independently. > > If you *need* to control the solvers in code (usually only if you have an > adaptive method where *your program* takes *active* control of the solution > process), you should pull out the objects and use the API instead of strings > to get what you need: > > SNESGetKSP(snes,&inner_ksp); > KSPSetType(inner_ksp,KSPIBCGS); > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 13:09:33 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 10 Dec 2010 20:09:33 +0100 Subject: [petsc-users] About the "PetscOptionsSetValue" usage In-Reply-To: References: Message-ID: On Fri, Dec 10, 2010 at 19:54, Filippo Spiga wrote: > I know that it is possible to use API to set parameters (KSPSetType, > SNESSetType). But a lot of options of HYPRE or SUPERLU for example have no > API. Instead of mix option from command-line and API I would like to put > everything in a config file You might be interested in the -options_file command line option and PetscOptionsInsertFile. Also, any options present in ~/.petscrc .petscrc (in current directory) petscrc (in current directory) get slurped in automatically, as well as the string in the PETSC_OPTIONS environment variable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.bloy at gmail.com Fri Dec 10 15:15:13 2010 From: luke.bloy at gmail.com (Luke Bloy) Date: Fri, 10 Dec 2010 16:15:13 -0500 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: <4D029757.6060708@seas.upenn.edu> References: <4D029757.6060708@seas.upenn.edu> Message-ID: <4D029861.8000508@gmail.com> Hi I'm new to Petsc so excuse me if this question is naive. I'm trying to solve the following system A x = b for x. A is a sparse square matrix (2000000 by 2000000 with ~45,000,000 nonzero elements) I'm currently using ex1 as the basis for solving the system and it is working quite well. My problem is that i have a large number (~500,000) of b vectors that I would like to find solutions for. My plan is to call KSPsolve repeatedly with each b. However I wonder if there are any solvers or approaches that might benefit from the fact that my A matrix does not change. Are there any decompositions that might still be sparse that would offer a speed up? Thanks for any suggestions. Luke From jed at 59A2.org Fri Dec 10 15:18:43 2010 From: jed at 59A2.org (Jed Brown) Date: Fri, 10 Dec 2010 22:18:43 +0100 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: <4D029861.8000508@gmail.com> References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> Message-ID: On Fri, Dec 10, 2010 at 22:15, Luke Bloy wrote: > My problem is that i have a large number (~500,000) of b vectors that I > would like to find solutions for. My plan is to call KSPsolve repeatedly > with each b. However I wonder if there are any solvers or approaches that > might benefit from the fact that my A matrix does not change. Are there any > decompositions that might still be sparse that would offer a speed up? 1. What is the high-level problem you are trying to solve? There might be a better way. 2. If you can afford the memory, a direct solve probably makes sense. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lbloy at seas.upenn.edu Fri Dec 10 15:10:47 2010 From: lbloy at seas.upenn.edu (Luke Bloy) Date: Fri, 10 Dec 2010 16:10:47 -0500 Subject: [petsc-users] optimizing repeated calls to KSPsolve? Message-ID: <4D029757.6060708@seas.upenn.edu> Hi I'm new to Petsc so excuse me if this question is naive. I'm trying to solve the following system A x = b for x. A is a sparse square matrix (2000000 by 2000000 with ~45,000,000 nonzero elements) I'm currently using ex1 as the basis for solving the system and it is working quite well. My problem is that i have a large number (~500,000) of b vectors that I would like to find solutions for. My plan is to call KSPsolve repeatedly with each b. However I wonder if there are any solvers or approaches that might benefit from the fact that my A matrix does not change. Are there any decompositions that might still be sparse that would offer a speed up? Thanks for any suggestions. Luke From luke.bloy at gmail.com Fri Dec 10 17:03:31 2010 From: luke.bloy at gmail.com (Luke Bloy) Date: Fri, 10 Dec 2010 18:03:31 -0500 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> Message-ID: <4D02B1C3.4040409@gmail.com> Thanks for the response. On 12/10/2010 04:18 PM, Jed Brown wrote: > On Fri, Dec 10, 2010 at 22:15, Luke Bloy > wrote: > > My problem is that i have a large number (~500,000) of b vectors > that I would like to find solutions for. My plan is to call > KSPsolve repeatedly with each b. However I wonder if there are any > solvers or approaches that might benefit from the fact that my A > matrix does not change. Are there any decompositions that might > still be sparse that would offer a speed up? > > > 1. What is the high-level problem you are trying to solve? There > might be a better way. > I'm solving a diffusion problem. essentially I have 2,000,000 possible states for my system to be in. The system evolves based on a markov matrix M, which describes the probability the system moves from one state to another. This matrix is extremely sparse on the < 100,000,000 nonzero elements. The problem is to pump mass/energy into the system at certain states. What I'm interested in is the steady state behavior of the system. basically the dynamics can be summarized as d_{t+1} = M d_{t} + d_i Where d_t is the state vector at time t and d_i shows the states I am pumping energy into. I want to find d_t as t goes to infinity. My current approach is to solve the following system. (I-M) d = d_i I'm certainly open to any suggestions you might have. > 2. If you can afford the memory, a direct solve probably makes sense. My understanding is the inverses would generally be dense. I certainly don't have any memory to hold a 2 million by 2 million dense matrix, I have about 40G to play with. So perhaps a decomposition might work? Which might you suggest? Thanks Luke -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 10 17:22:58 2010 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 10 Dec 2010 23:22:58 +0000 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: <4D02B1C3.4040409@gmail.com> References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> <4D02B1C3.4040409@gmail.com> Message-ID: On Fri, Dec 10, 2010 at 11:03 PM, Luke Bloy wrote: > > Thanks for the response. > > On 12/10/2010 04:18 PM, Jed Brown wrote: > > On Fri, Dec 10, 2010 at 22:15, Luke Bloy wrote: > >> My problem is that i have a large number (~500,000) of b vectors that I >> would like to find solutions for. My plan is to call KSPsolve repeatedly >> with each b. However I wonder if there are any solvers or approaches that >> might benefit from the fact that my A matrix does not change. Are there any >> decompositions that might still be sparse that would offer a speed up? > > > 1. What is the high-level problem you are trying to solve? There might be > a better way. > > I'm solving a diffusion problem. essentially I have 2,000,000 possible > states for my system to be in. The system evolves based on a markov matrix > M, which describes the probability the system moves from one state to > another. This matrix is extremely sparse on the < 100,000,000 nonzero > elements. The problem is to pump mass/energy into the system at certain > states. What I'm interested in is the steady state behavior of the system. > > basically the dynamics can be summarized as > > d_{t+1} = M d_{t} + d_i > > Where d_t is the state vector at time t and d_i shows the states I am > pumping energy into. I want to find d_t as t goes to infinity. > > My current approach is to solve the following system. > > (I-M) d = d_i > > I'm certainly open to any suggestions you might have. > > 2. If you can afford the memory, a direct solve probably makes sense. > > > My understanding is the inverses would generally be dense. I certainly > don't have any memory to hold a 2 million by 2 million dense matrix, I have > about 40G to play with. So perhaps a decomposition might work? Which might > you suggest? > Try -pc_type lu -pc_mat_factor_package once you have reconfigured using --download-superlu_dist --download-mumps They are sparse LU factorization packages that might work. Matt > Thanks > Luke > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 17:30:39 2010 From: jed at 59A2.org (Jed Brown) Date: Sat, 11 Dec 2010 00:30:39 +0100 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: <4D02B1C3.4040409@gmail.com> References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> <4D02B1C3.4040409@gmail.com> Message-ID: On Sat, Dec 11, 2010 at 00:03, Luke Bloy wrote: > I'm solving a diffusion problem. essentially I have 2,000,000 possible > states for my system to be in. The system evolves based on a markov matrix > M, which describes the probability the system moves from one state to > another. This matrix is extremely sparse on the < 100,000,000 nonzero > elements. The problem is to pump mass/energy into the system at certain > states. What I'm interested in is the steady state behavior of the system. > > basically the dynamics can be summarized as > > d_{t+1} = M d_{t} + d_i > > Where d_t is the state vector at time t and d_i shows the states I am > pumping energy into. I want to find d_t as t goes to infinity. > > My current approach is to solve the following system. > > (I-M) d = d_i > So you want to do this for some 500,000 d_i? What problem are you really trying to solve? Is it really to just brute-force compute states for all these inputs? What are you doing with the resulting 500k states (all 8 terabytes of it)? Are you, for example, looking for some d_i that would change the steady state d in a certain way? 2. If you can afford the memory, a direct solve probably makes sense. > > > My understanding is the inverses would generally be dense. I certainly > don't have any memory to hold a 2 million by 2 million dense matrix, I have > about 40G to play with. So perhaps a decomposition might work? Which might > you suggest? > While inverses are almost always dense, sparse factorization is far from dense. For PDE problems factored in an optimal ordering, the memory asymptotics are n*log n in 2D and n^{4/3} in 3D. The time asymptotics are n^{3/2} and n^2 respectively. Compare to n^2 memory, n^3 time for dense. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmnasr at gmail.com Fri Dec 10 17:46:40 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Fri, 10 Dec 2010 15:46:40 -0800 Subject: [petsc-users] global index distributed arrays Message-ID: Hi guys, I was wondering if there is an easy way of accessing the global index of a node which is not within the local and ghost node regions on every processor for DA. To be more more specific, I am trying to setup a matrix based on a three-dimensional DA. (Star stencil, width=1). For some special nodes, I need to insert nonzero values which do not fit in the local plus the ghost regions of the DA. I know that I can not use MatSetValuesStencil anymore, but I still can use MatSetValues. But I need to know the global index of those nodes. I tried to use DAGetGlobalIndices(), but that would again return the global indices of the local plus ghost nodes on current processor. I know that I could use some MPI commands to pass those indices among processors, but I was wondering if there a clean and neat way with which each processor can have access to the global index for any given 3-d index. Thank, Mohamad -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 10 18:40:43 2010 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Dec 2010 00:40:43 +0000 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Fri, Dec 10, 2010 at 11:46 PM, Mohamad M. Nasr-Azadani wrote: > Hi guys, > > I was wondering if there is an easy way of accessing the global index of a > node which is not within the local and ghost node regions on every processor > for DA. > To be more more specific, I am trying to setup a matrix based on a > three-dimensional DA. (Star stencil, width=1). > For some special nodes, I need to insert nonzero values which do not fit in > the local plus the ghost regions of the DA. > I know that I can not use MatSetValuesStencil anymore, but I still can use > MatSetValues. But I need to know the global index of those nodes. > I tried to use DAGetGlobalIndices(), but that would again return the global > indices of the local plus ghost nodes on current processor. > I know that I could use some MPI commands to pass those indices among > processors, but I was wondering if there a clean and neat way with which > each processor can have access to the global index for any given 3-d index. > ((k*N + j)*M + i)*C + c Matt > Thank, > Mohamad > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 18:47:36 2010 From: jed at 59A2.org (Jed Brown) Date: Sat, 11 Dec 2010 01:47:36 +0100 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Sat, Dec 11, 2010 at 01:40, Matthew Knepley wrote: > I know that I could use some MPI commands to pass those indices among >> processors, but I was wondering if there a clean and neat way with which >> each processor can have access to the global index for any given 3-d index. >> > > ((k*N + j)*M + i)*C + c > Matt, you have seriously missed the point. Mohamad, there is not an easy way to calculate it for an arbitrary index, it necessarily involves a search. Do you need it for arbitrary indices, or do you know in advance which global indices are needed? DAGetAO() can give you what you are after, but there is usually a way to avoid using that code since it is not scalable. Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 10 18:53:37 2010 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Dec 2010 00:53:37 +0000 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Sat, Dec 11, 2010 at 12:47 AM, Jed Brown wrote: > On Sat, Dec 11, 2010 at 01:40, Matthew Knepley wrote: > >> I know that I could use some MPI commands to pass those indices among >>> processors, but I was wondering if there a clean and neat way with which >>> each processor can have access to the global index for any given 3-d index. >>> >> >> ((k*N + j)*M + i)*C + c >> > > Matt, you have seriously missed the point. > > Mohamad, there is not an easy way to calculate it for an arbitrary index, > it necessarily involves a search. Do you need it for arbitrary indices, or > do you know in advance which global indices are needed? DAGetAO() can give > you what you are after, but there is usually a way to avoid using that code > since it is not scalable. > What are you guys talking about? He is asking ("global index for any given 3-d index") for a map (i, j, k) --> ((k*N + j)*M + i)*C + c I can't imagine what you are searching for? The process which owns a given index does not involve a search. Matt > Jed > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 18:56:30 2010 From: jed at 59A2.org (Jed Brown) Date: Sat, 11 Dec 2010 01:56:30 +0100 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Sat, Dec 11, 2010 at 01:53, Matthew Knepley wrote: > What are you guys talking about? He is asking ("global index for any given > 3-d index") for a map > > (i, j, k) --> ((k*N + j)*M + i)*C + c > > I can't imagine what you are searching for? The process which owns a given > index does not involve a search. > The "global index" is in the "PETSc ordering". He wants this index for an arbitrary (i,j,k) which are not in the ghosted patch of the current process. You either have to store the full mapping, on search lx,ly,lz to locate the owner, then compute the index relative to that process. I don't think that code exists in PETSc. It wouldn't be too hard to write, but it's not the most beautiful thing to do. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 10 19:01:46 2010 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Dec 2010 01:01:46 +0000 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Sat, Dec 11, 2010 at 12:56 AM, Jed Brown wrote: > On Sat, Dec 11, 2010 at 01:53, Matthew Knepley wrote: > >> What are you guys talking about? He is asking ("global index for any given >> 3-d index") for a map >> >> (i, j, k) --> ((k*N + j)*M + i)*C + c >> >> I can't imagine what you are searching for? The process which owns a given >> index does not involve a search. >> > > The "global index" is in the "PETSc ordering". He wants this index for an > arbitrary (i,j,k) which are not in the ghosted patch of the current process. > You either have to store the full mapping, on search lx,ly,lz to locate the > owner, then compute the index relative to that process. I don't think that > code exists in PETSc. It wouldn't be too hard to write, but it's not the > most beautiful thing to do. > I am not sure why you would need to access regions outside your local piece. For instance, consider how we treat boundary conditions. This is access to a fixed index, but we check on each process if (i == i_bc) { } Can't you do the same thing with your extra rows? Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmnasr at gmail.com Fri Dec 10 19:05:21 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Fri, 10 Dec 2010 17:05:21 -0800 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Fri, Dec 10, 2010 at 4:56 PM, Jed Brown wrote: > On Sat, Dec 11, 2010 at 01:53, Matthew Knepley wrote: > >> What are you guys talking about? He is asking ("global index for any given >> 3-d index") for a map >> >> (i, j, k) --> ((k*N + j)*M + i)*C + c >> >> I can't imagine what you are searching for? The process which owns a given >> index does not involve a search. >> > > The "global index" is in the "PETSc ordering". He wants this index for an > arbitrary (i,j,k) which are not in the ghosted patch of the current process. > You either have to store the full mapping, on search lx,ly,lz to locate the > owner, then compute the index relative to that process. I don't think that > code exists in PETSc. It wouldn't be too hard to write, but it's not the > most beautiful thing to do. > Mat, that's exactly came to my mind as the first solution. But I was hoping that I could avoid that. As you said it is not that hard to write that function, that's why I hoped that PETSc alread has that. Thanks for your help, Mohamad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmnasr at gmail.com Fri Dec 10 19:07:44 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Fri, 10 Dec 2010 17:07:44 -0800 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: Sorry for my last message, I meant thanks to Jed, Mohamad On Fri, Dec 10, 2010 at 5:05 PM, Mohamad M. Nasr-Azadani wrote: > > > On Fri, Dec 10, 2010 at 4:56 PM, Jed Brown wrote: > >> On Sat, Dec 11, 2010 at 01:53, Matthew Knepley wrote: >> >>> What are you guys talking about? He is asking ("global index for any >>> given 3-d index") for a map >>> >>> (i, j, k) --> ((k*N + j)*M + i)*C + c >>> >>> I can't imagine what you are searching for? The process which owns a >>> given index does not involve a search. >>> >> >> The "global index" is in the "PETSc ordering". He wants this index for an >> arbitrary (i,j,k) which are not in the ghosted patch of the current process. >> You either have to store the full mapping, on search lx,ly,lz to locate the >> owner, then compute the index relative to that process. I don't think that >> code exists in PETSc. It wouldn't be too hard to write, but it's not the >> most beautiful thing to do. >> > > Mat, that's exactly came to my mind as the first solution. But I was hoping > that I could avoid that. As you said it is not that hard to write that > function, that's why I hoped that PETSc alread has that. > Thanks for your help, > Mohamad > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmnasr at gmail.com Fri Dec 10 19:14:01 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Fri, 10 Dec 2010 17:14:01 -0800 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: I am not sure why you would need to access regions outside your local piece. For instance, consider how we treat boundary conditions. This is access to a fixed index, but we check on each process if (i == i_bc) { } Can't you do the same thing with your extra rows? Matt Well, my story is a bit complicated. Now that you asked, I would like to have your opinion too. So, what I am trying to to is to use STAR stenctil to descretize and solve Poisson type equations. So far so good, I can take care of the regular nodes with 3D DA's and width=1. The problems rises for some nodes that are neighboring solid boundaries. Those nodes, do not follow Poisson equation anymore and they just obey some interpolation equations which might need nodes in the BOX stencil of width=3. I am restricted by memory requirements, and try to avoid creating my matrix using a 3D DA BOX_STENCIL and width=3. That would cost a lot and soon I need to launch simulations with O(10^8) grid points. So, I thought, I create my matrix using STAR_STENCIL and width=1 and just manually insert the new nonzeros into the matrix. Since the number of those nodes is not that many, I thought it would be a good approach. Thanks if you think you could guide me here too. Thanks for your help too, Mohamad On Fri, Dec 10, 2010 at 5:01 PM, Matthew Knepley wrote: > On Sat, Dec 11, 2010 at 12:56 AM, Jed Brown wrote: > >> On Sat, Dec 11, 2010 at 01:53, Matthew Knepley wrote: >> >>> What are you guys talking about? He is asking ("global index for any >>> given 3-d index") for a map >>> >>> (i, j, k) --> ((k*N + j)*M + i)*C + c >>> >>> I can't imagine what you are searching for? The process which owns a >>> given index does not involve a search. >>> >> >> The "global index" is in the "PETSc ordering". He wants this index for an >> arbitrary (i,j,k) which are not in the ghosted patch of the current process. >> You either have to store the full mapping, on search lx,ly,lz to locate the >> owner, then compute the index relative to that process. I don't think that >> code exists in PETSc. It wouldn't be too hard to write, but it's not the >> most beautiful thing to do. >> > > I am not sure why you would need to access regions outside your local > piece. For instance, > consider how we treat boundary conditions. This is access to a fixed index, > but we check on > each process > > if (i == i_bc) { } > > Can't you do the same thing with your extra rows? > > Matt > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 19:22:46 2010 From: jed at 59A2.org (Jed Brown) Date: Sat, 11 Dec 2010 02:22:46 +0100 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Sat, Dec 11, 2010 at 02:14, Mohamad M. Nasr-Azadani wrote: > Those nodes, do not follow Poisson equation anymore and they just obey some > interpolation equations which might need nodes in the BOX stencil of > width=3. What kind of boundary condition is this? In any case, you can create an independent DA of the same size as your original, but with a box stencil of width 3. Then DAGetISLocalToGlobalMapping (generalized to DMGetLocalToGlobalMapping in petsc-dev) will give you access to those global indices. You will likely have to adjust preallocation for these extra entries (unless, somehow strangely, interpolation actually uses no more points, just from different places). -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmnasr at gmail.com Fri Dec 10 20:03:36 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Fri, 10 Dec 2010 18:03:36 -0800 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: What kind of boundary condition is this? It is both Dirichlet and Neumann B.C. It uses trilinear interpolation to impose the correct boundary condition on the node based on the exact location of the interface. In any case, you can create an independent DA of the same size as your original, but with a box stencil of width 3. Then DAGetISLocalToGlobalMapping (generalized to DMGetLocalToGlobalMapping in petsc-dev) will give you access to those global indices. You will likely have to adjust preallocation for these extra entries (unless, somehow strangely, interpolation actually uses no more points, just from different places). That sounds like a good suggestion. I may do this as it seems to be the reasonable and easy way to go. But regarding the memory allocation, for all the regular nodes, for each row, that would be only 7 nonzeroes (STAR stencil). But for the special nodes (close to the boundaries), they need 9 nonzeros per row and that would not necessary follow the STAR stencil. For example, for the node at (i,j,k) it might (depending on the normal direction of the solid surface) add the nonzeros at (i,j,k) (i+1,j+1,k) (i+1,j+2,k) (i+2,j+1,k) (i+2,j+2,k) (i+1,j+1,k+1) (i+1,j+2,k+1) (i+2,j+1,k+1) (i+2,j+2,k+1) Do you think that would add a lot of extra memory allocation? Thanks and have a good weekend, Mohamad On Fri, Dec 10, 2010 at 5:22 PM, Jed Brown wrote: > On Sat, Dec 11, 2010 at 02:14, Mohamad M. Nasr-Azadani wrote: > >> Those nodes, do not follow Poisson equation anymore and they just obey >> some interpolation equations which might need nodes in the BOX stencil of >> width=3. > > > What kind of boundary condition is this? > > In any case, you can create an independent DA of the same size as your > original, but with a box stencil of width 3. Then > DAGetISLocalToGlobalMapping (generalized to DMGetLocalToGlobalMapping in > petsc-dev) will give you access to those global indices. You will likely > have to adjust preallocation for these extra entries (unless, somehow > strangely, interpolation actually uses no more points, just from different > places). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Dec 10 20:06:30 2010 From: jed at 59A2.org (Jed Brown) Date: Sat, 11 Dec 2010 03:06:30 +0100 Subject: [petsc-users] global index distributed arrays In-Reply-To: References: Message-ID: On Sat, Dec 11, 2010 at 03:03, Mohamad M. Nasr-Azadani wrote: > Do you think that would add a lot of extra memory allocation? It's not "a lot", but the first assembly won't go well because the preallocation will be too small (unless you update it). -------------- next part -------------- An HTML attachment was scrubbed... URL: From lbloy at seas.upenn.edu Sat Dec 11 00:21:05 2010 From: lbloy at seas.upenn.edu (Luke Bloy) Date: Sat, 11 Dec 2010 01:21:05 -0500 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> <4D02B1C3.4040409@gmail.com> Message-ID: <4D031851.4010000@seas.upenn.edu> Matt thanks for the response. I'll give those a try. I'm also interested in try the Cholesky decomposition is there particular external packages that are required to use it? Thanks again. Luke On 12/10/2010 06:22 PM, Matthew Knepley wrote: > On Fri, Dec 10, 2010 at 11:03 PM, Luke Bloy > wrote: > > > Thanks for the response. > > On 12/10/2010 04:18 PM, Jed Brown wrote: >> On Fri, Dec 10, 2010 at 22:15, Luke Bloy > > wrote: >> >> My problem is that i have a large number (~500,000) of b >> vectors that I would like to find solutions for. My plan is >> to call KSPsolve repeatedly with each b. However I wonder if >> there are any solvers or approaches that might benefit from >> the fact that my A matrix does not change. Are there any >> decompositions that might still be sparse that would offer a >> speed up? >> >> >> 1. What is the high-level problem you are trying to solve? There >> might be a better way. >> > I'm solving a diffusion problem. essentially I have 2,000,000 > possible states for my system to be in. The system evolves based > on a markov matrix M, which describes the probability the system > moves from one state to another. This matrix is extremely sparse > on the < 100,000,000 nonzero elements. The problem is to pump > mass/energy into the system at certain states. What I'm interested > in is the steady state behavior of the system. > > basically the dynamics can be summarized as > > d_{t+1} = M d_{t} + d_i > > Where d_t is the state vector at time t and d_i shows the states I > am pumping energy into. I want to find d_t as t goes to infinity. > > My current approach is to solve the following system. > > (I-M) d = d_i > > I'm certainly open to any suggestions you might have. > >> 2. If you can afford the memory, a direct solve probably makes sense. > > My understanding is the inverses would generally be dense. I > certainly don't have any memory to hold a 2 million by 2 million > dense matrix, I have about 40G to play with. So perhaps a > decomposition might work? Which might you suggest? > > > Try -pc_type lu -pc_mat_factor_package once you > have reconfigured using > > --download-superlu_dist --download-mumps > > They are sparse LU factorization packages that might work. > > Matt > > Thanks > Luke > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Dec 11 00:27:39 2010 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Dec 2010 06:27:39 +0000 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: <4D031851.4010000@seas.upenn.edu> References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> <4D02B1C3.4040409@gmail.com> <4D031851.4010000@seas.upenn.edu> Message-ID: On Sat, Dec 11, 2010 at 6:21 AM, Luke Bloy wrote: > Matt thanks for the response. I'll give those a try. I'm also interested > in try the Cholesky decomposition is there particular external packages > that are required to use it? > Mumps should do Cholesky for a symmetric matrix. Matt > Thanks again. > Luke > > On 12/10/2010 06:22 PM, Matthew Knepley wrote: > > On Fri, Dec 10, 2010 at 11:03 PM, Luke Bloy wrote: > >> >> Thanks for the response. >> >> On 12/10/2010 04:18 PM, Jed Brown wrote: >> >> On Fri, Dec 10, 2010 at 22:15, Luke Bloy wrote: >> >>> My problem is that i have a large number (~500,000) of b vectors that I >>> would like to find solutions for. My plan is to call KSPsolve repeatedly >>> with each b. However I wonder if there are any solvers or approaches that >>> might benefit from the fact that my A matrix does not change. Are there any >>> decompositions that might still be sparse that would offer a speed up? >> >> >> 1. What is the high-level problem you are trying to solve? There might be >> a better way. >> >> I'm solving a diffusion problem. essentially I have 2,000,000 possible >> states for my system to be in. The system evolves based on a markov matrix >> M, which describes the probability the system moves from one state to >> another. This matrix is extremely sparse on the < 100,000,000 nonzero >> elements. The problem is to pump mass/energy into the system at certain >> states. What I'm interested in is the steady state behavior of the system. >> >> basically the dynamics can be summarized as >> >> d_{t+1} = M d_{t} + d_i >> >> Where d_t is the state vector at time t and d_i shows the states I am >> pumping energy into. I want to find d_t as t goes to infinity. >> >> My current approach is to solve the following system. >> >> (I-M) d = d_i >> >> I'm certainly open to any suggestions you might have. >> >> 2. If you can afford the memory, a direct solve probably makes sense. >> >> >> My understanding is the inverses would generally be dense. I certainly >> don't have any memory to hold a 2 million by 2 million dense matrix, I have >> about 40G to play with. So perhaps a decomposition might work? Which might >> you suggest? >> > > Try -pc_type lu -pc_mat_factor_package once you > have reconfigured using > > --download-superlu_dist --download-mumps > > They are sparse LU factorization packages that might work. > > Matt > > >> Thanks >> Luke >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.bloy at gmail.com Sat Dec 11 00:49:49 2010 From: luke.bloy at gmail.com (Luke Bloy) Date: Sat, 11 Dec 2010 01:49:49 -0500 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> <4D02B1C3.4040409@gmail.com> Message-ID: <4D031F0D.9040401@gmail.com> On 12/10/2010 06:30 PM, Jed Brown wrote: > On Sat, Dec 11, 2010 at 00:03, Luke Bloy > wrote: > > I'm solving a diffusion problem. essentially I have 2,000,000 > possible states for my system to be in. The system evolves based > on a markov matrix M, which describes the probability the system > moves from one state to another. This matrix is extremely sparse > on the < 100,000,000 nonzero elements. The problem is to pump > mass/energy into the system at certain states. What I'm interested > in is the steady state behavior of the system. > > basically the dynamics can be summarized as > > d_{t+1} = M d_{t} + d_i > > Where d_t is the state vector at time t and d_i shows the states I > am pumping energy into. I want to find d_t as t goes to infinity. > > My current approach is to solve the following system. > > (I-M) d = d_i > > > So you want to do this for some 500,000 d_i? What problem are you > really trying to solve? Is it really to just brute-force compute > states for all these inputs? What are you doing with the resulting > 500k states (all 8 terabytes of it)? Are you, for example, looking > for some d_i that would change the steady state d in a certain way? > Yes I'd like do do this for roughly 500,000 d_i. I'm solving a diffusion problem, I only have local measures of the diffusion process which is what i use to determine that matrix M. Now the 500,000 d_i are the boundaries to my diffusion problem, what i need to know is who much of what gets pumped in though a given boundary state exits the system through the other boundaries. Do i really need to do all 500,000 Probably not. this is the highest resolution mesh of the boundary i can compute from my data. A lower res mesh would probably be sufficient. But I wont know until i do it either way I'd like to use the highest res mesh that I can. If you can suggest an alternative approach I am all ears. Luke >> 2. If you can afford the memory, a direct solve probably makes sense. > > My understanding is the inverses would generally be dense. I > certainly don't have any memory to hold a 2 million by 2 million > dense matrix, I have about 40G to play with. So perhaps a > decomposition might work? Which might you suggest? > > > While inverses are almost always dense, sparse factorization is far > from dense. For PDE problems factored in an optimal ordering, the > memory asymptotics are n*log n in 2D and n^{4/3} in 3D. The time > asymptotics are n^{3/2} and n^2 respectively. Compare to n^2 memory, > n^3 time for dense. > > Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Sat Dec 11 01:20:55 2010 From: jed at 59A2.org (Jed Brown) Date: Sat, 11 Dec 2010 08:20:55 +0100 Subject: [petsc-users] optimizing repeated calls to KSPsolve? In-Reply-To: <4D031F0D.9040401@gmail.com> References: <4D029757.6060708@seas.upenn.edu> <4D029861.8000508@gmail.com> <4D02B1C3.4040409@gmail.com> <4D031F0D.9040401@gmail.com> Message-ID: What will you do with the 500000 responses? Jed On Dec 11, 2010 7:49 AM, "Luke Bloy" wrote: On 12/10/2010 06:30 PM, Jed Brown wrote: > > On Sat, Dec 11, 2010 at 00:03, Luke Bloy >> 2. If you can afford the memory, a direct solve probably makes sense. >> >> >> My understanding... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.pola at gmail.com Sat Dec 11 08:32:18 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Sat, 11 Dec 2010 15:32:18 +0100 Subject: [petsc-users] MatMult Message-ID: <1292077938.2074.38.camel@desktop> Hello again, I compiled one of te examples. I used sparse matix called 02-raefsky3. I used -vec_type cuda and -mat_type seqaijcuda. When I see summary of the operations performed by program there is MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100 0 0 0 2100 0 0 0 147 Does time of performing MatMult includes memory transfer for loading matrix in GPU memory or just exact computation time? Thanks in advance. Kuba. From jakub.pola at gmail.com Sat Dec 11 09:36:46 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Sat, 11 Dec 2010 16:36:46 +0100 Subject: [petsc-users] Set number of iterations Message-ID: <1292081806.2074.46.camel@desktop> Hi again, I was searching trought documentation but without success. I would like to know how to set the number of iterations during the KSPSolve(). Is it possible to set the iteration numbers to 100 for example? Kuba From filippo.spiga at disco.unimib.it Sat Dec 11 10:05:55 2010 From: filippo.spiga at disco.unimib.it (Filippo Spiga) Date: Sat, 11 Dec 2010 11:05:55 -0500 Subject: [petsc-users] Set number of iterations In-Reply-To: <1292081806.2074.46.camel@desktop> References: <1292081806.2074.46.camel@desktop> Message-ID: Specifying the option "-ksp_max_it 10000" after the name of the executable or in the code using "KSPSetTolerances" See: http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/KSP/KSPSetTolerances.html Cheers -- Filippo SPIGA, MSc Computer Science ~ homepage: http://tinyurl.com/fspiga ~ ?Nobody will drive us out of Cantor's paradise.? -- David Hilbert ***** Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission." On Sat, Dec 11, 2010 at 10:36 AM, Jakub Pola wrote: > Hi again, > > I was searching trought documentation but without success. I would like > to know how to set the number of iterations during the KSPSolve(). Is it > possible to set the iteration numbers to 100 for example? > > Kuba > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Dec 11 11:50:17 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 11 Dec 2010 11:50:17 -0600 Subject: [petsc-users] MatMult In-Reply-To: <1292077938.2074.38.camel@desktop> References: <1292077938.2074.38.camel@desktop> Message-ID: <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> To answer this you need to understand that PETSc copies vectors and matrices to the GPU memory "on demand" (that is exactly when they are first needed on the GPU, and not before) and once it has copied to the GPU it keeps track of it and will NOT copy it down again if it is already there. Hence in your run below, yes it includes the copy time down. But note that ONE multiply on the GPU is absurd, it does not make sense to copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do "sandalone" benchmarking where a single kernel is called by it self once, the time results are useless. Always run a FULL application with -log_summary; for example in this case a full KSPSolve() that requires a bunch of iterations. Then you can look at the performance of each kernel. The reason to do it this way is that the numbers can be very different and what matters is runs in APPLICATIONS so that is what should be measured. If say you run KSP with 20 iterations then the time to copy the matrix down to the GPU is amortized over those 20 iterations and thus maybe ok. You should see the flop rate for the MatMult() go up in this case. You may have noticed we have a log entry for VecCopyToGPU() we will be adding one for matrices as well thus you will be able to see how long the copy time is but not that the copy time is still counted in the MatMult() time if the first copy of the matrix to GPU is triggered by the MatMult. You can subtract the copy time from the mult time to get the per multiply time, this would correspond to the multiply time in the limit of a single copy down and many, many multiplies on the GPU. Barry On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: > Hello again, > > I compiled one of te examples. I used sparse matix called 02-raefsky3. > I used -vec_type cuda and -mat_type seqaijcuda. > > When I see summary of the operations performed by program there is > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100 > 0 0 0 2100 0 0 0 147 > > Does time of performing MatMult includes memory transfer for loading > matrix in GPU memory or just exact computation time? > > Thanks in advance. > Kuba. > From jakub.pola at gmail.com Sat Dec 11 12:08:49 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Sat, 11 Dec 2010 19:08:49 +0100 Subject: [petsc-users] MatMult In-Reply-To: <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> References: <1292077938.2074.38.camel@desktop> <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> Message-ID: <1292090930.2074.128.camel@desktop> Thank you very much for you answer. That helps me a lot. Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze: > To answer this you need to understand that PETSc copies vectors and matrices to the GPU memory "on demand" (that is exactly when they are first needed on the GPU, and not before) and once it has copied to the GPU it keeps track of it and will NOT copy it down again if it is already there. > > Hence in your run below, yes it includes the copy time down. > > But note that ONE multiply on the GPU is absurd, it does not make sense to copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do "sandalone" benchmarking where a single kernel is called by it self once, the time results are useless. Always run a FULL application with -log_summary; for example in this case a full KSPSolve() that requires a bunch of iterations. Then you can look at the performance of each kernel. The reason to do it this way is that the numbers can be very different and what matters is runs in APPLICATIONS so that is what should be measured. > > If say you run KSP with 20 iterations then the time to copy the matrix down to the GPU is amortized over those 20 iterations and thus maybe ok. You should see the flop rate for the MatMult() go up in this case. > > You may have noticed we have a log entry for VecCopyToGPU() we will be adding one for matrices as well thus you will be able to see how long the copy time is but not that the copy time is still counted in the MatMult() time if the first copy of the matrix to GPU is triggered by the MatMult. You can subtract the copy time from the mult time to get the per multiply time, this would correspond to the multiply time in the limit of a single copy down and many, many multiplies on the GPU. > > Barry > > > > > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: > > > Hello again, > > > > I compiled one of te examples. I used sparse matix called 02-raefsky3. > > I used -vec_type cuda and -mat_type seqaijcuda. > > > > When I see summary of the operations performed by program there is > > > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100 > > 0 0 0 2100 0 0 0 147 > > > > Does time of performing MatMult includes memory transfer for loading > > matrix in GPU memory or just exact computation time? > > > > Thanks in advance. > > Kuba. > > > From jakub.pola at gmail.com Sun Dec 12 14:14:09 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Sun, 12 Dec 2010 21:14:09 +0100 Subject: [petsc-users] Create csr matrix Message-ID: <1292184849.8638.4.camel@desktop> Hi, Could you please help me with creating CSR matrix. I have only one processor so It have to be done locally Here is the matrix from those information I would like to have matrix petsc matrix. double vals [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0} ; int c_idx [] = {1, 3, 2, 0, 1, 2, 2, 3} ; int r_idx [] = {0, 2, 3, 6, 8} ; int n_rows = 4 ; //square matrix Thank you in advance for help. Kuba From bsmith at mcs.anl.gov Sun Dec 12 14:30:04 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 12 Dec 2010 14:30:04 -0600 Subject: [petsc-users] Create csr matrix In-Reply-To: <1292184849.8638.4.camel@desktop> References: <1292184849.8638.4.camel@desktop> Message-ID: <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov> MatCreateSeqAIJWithArrays() On Dec 12, 2010, at 2:14 PM, Jakub Pola wrote: > Hi, > > Could you please help me with creating CSR matrix. I have only one > processor so It have to be done locally > > Here is the matrix from those information I would like to have matrix > petsc matrix. > > double vals [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0} ; > int c_idx [] = {1, 3, 2, 0, 1, 2, 2, 3} ; > int r_idx [] = {0, 2, 3, 6, 8} ; > int n_rows = 4 ; //square matrix > > Thank you in advance for help. > Kuba > > > > From jakub.pola at gmail.com Sun Dec 12 14:41:38 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Sun, 12 Dec 2010 21:41:38 +0100 Subject: [petsc-users] Create csr matrix In-Reply-To: <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov> References: <1292184849.8638.4.camel@desktop> <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov> Message-ID: <1292186498.8638.7.camel@desktop> Thanks, Is there also so easy way to extract those arrays from already created matrix? I created matrix A with function: MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, n_rows, n_rows, r_idx, c_idx, vals, A); Now I would like to extract all tables from matrix A; Dnia 2010-12-12, nie o godzinie 14:30 -0600, Barry Smith pisze: > MatCreateSeqAIJWithArrays From bsmith at mcs.anl.gov Sun Dec 12 14:58:16 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 12 Dec 2010 14:58:16 -0600 Subject: [petsc-users] Create csr matrix In-Reply-To: <1292186498.8638.7.camel@desktop> References: <1292184849.8638.4.camel@desktop> <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov> <1292186498.8638.7.camel@desktop> Message-ID: On Dec 12, 2010, at 2:41 PM, Jakub Pola wrote: > Thanks, > > Is there also so easy way to extract those arrays from already created > matrix? Not particularly because we do not like the idea of directly manipulating the storage details of a particular sparse matrix type. But you can use MatGetRowIJ()/MatRestoreRowIJ() and MatGetArray() if you really want to. > I created matrix A with function: > MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, n_rows, n_rows, r_idx, c_idx, > vals, A); > > Now I would like to extract all tables from matrix A; What is a "table" of a matrix A? Barry > > > Dnia 2010-12-12, nie o godzinie 14:30 -0600, Barry Smith pisze: >> MatCreateSeqAIJWithArrays > From jakub.pola at gmail.com Sun Dec 12 15:15:08 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Sun, 12 Dec 2010 22:15:08 +0100 Subject: [petsc-users] Create csr matrix In-Reply-To: References: <1292184849.8638.4.camel@desktop> <2BC7307C-1FD9-4880-971E-F2C9314DDC79@mcs.anl.gov> <1292186498.8638.7.camel@desktop> Message-ID: <1292188508.8638.14.camel@desktop> The reason I want to perform this operation is that I have CG solver based on GPU. It takes arguments in the way I have written so double vals [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0} ; int c_idx [] = {1, 3, 2, 0, 1, 2, 2, 3} ; int r_idx [] = {0, 2, 3, 6, 8} ; int n_rows = 4 ; //square matrix That will help me a lot with testing. I can load matrix using PETSC then I would like to extract this matrix to those arrays. Dnia 2010-12-12, nie o godzinie 14:58 -0600, Barry Smith pisze: > On Dec 12, 2010, at 2:41 PM, Jakub Pola wrote: > > > Thanks, > > > > Is there also so easy way to extract those arrays from already created > > matrix? > > Not particularly because we do not like the idea of directly manipulating the storage details of a particular sparse matrix type. But you can use MatGetRowIJ()/MatRestoreRowIJ() and MatGetArray() if you really want to. > > > > I created matrix A with function: > > MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, n_rows, n_rows, r_idx, c_idx, > > vals, A); > > > > > Now I would like to extract all tables from matrix A; > > What is a "table" of a matrix A? > > Barry > > > > > > > Dnia 2010-12-12, nie o godzinie 14:30 -0600, Barry Smith pisze: > >> MatCreateSeqAIJWithArrays > > > From filippo.spiga at disco.unimib.it Mon Dec 13 00:35:30 2010 From: filippo.spiga at disco.unimib.it (Filippo Spiga) Date: Mon, 13 Dec 2010 01:35:30 -0500 Subject: [petsc-users] About the "PetscOptionsSetValue" usage In-Reply-To: References: Message-ID: I was referring to the input file of my application, I do not want to pass PETSc option by the command line or using another specific file input. Anyway, I tried the strategy "KSPSetOptionsPrefix(..)"/"SNES SetOptionsPrefix(...);" + "PetscOptionsSetValue(...)" and it works very well!!! But now I have another question. Using these routines, every time I cyclically call KSP and/or SNES in my program I put new options inside the PETSc option database. Most of these options, that have the same prefix, are replicated. Is this a problem? Is it always true that I will use the last one inserted? Let's consider this example: begin program begin program iteration number 1 CreateKSP(myksp1); PetscOptionsSetValue("-a", "1"); PetscOptionsSetValue("-b", "2"); KSPSolve(myksp1) DestroyKSP(myksp1); end program iteration number 1 begin program iteration number 2 CreateKSP(myksp2); PetscOptionsSetValue("-a", "5"); PetscOptionsSetValue("-c", "5"); KSPSolve(myksp2); DestroyKSP(myksp2); end program iteration number 2 begin program iteration number 3 CreateKSP(myksp3); PetscOptionsSetValue("-a", "10"); PetscOptionsSetValue("-b", "20"); KESPSolve(myksp3); DestroyKSP(myksp3); end program iteration number 3 end program In the iteration 2 appears the option "-c", what can I do to remove the option "-b" inside the internal PETSc option database? And similar, in the iteration 3 the option "-c" disappears and the option "-b" returns with a different value. What can I do to remove the option "-c" inside the internal PETSc option database? Is there a routine that reset/empty completely the internal PETSc database option without terminate the program? Or the only way is to assign a different prefix for every program iteration? Thanks very much in advance, Cheers! -- Filippo SPIGA, MSc Computer Science ~ homepage: http://tinyurl.com/fspiga ~ ?Nobody will drive us out of Cantor's paradise.? -- David Hilbert ***** Disclaimer: "Please note this message and any attachment are CONFIDENTIAL an may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission." On Fri, Dec 10, 2010 at 2:09 PM, Jed Brown wrote: > On Fri, Dec 10, 2010 at 19:54, Filippo Spiga < > filippo.spiga at disco.unimib.it> wrote: > >> I know that it is possible to use API to set parameters (KSPSetType, >> SNESSetType). But a lot of options of HYPRE or SUPERLU for example have no >> API. Instead of mix option from command-line and API I would like to put >> everything in a config file > > > You might be interested in the -options_file command line option and > PetscOptionsInsertFile. Also, any options present in > > ~/.petscrc > .petscrc (in current directory) > petscrc (in current directory) > > get slurped in automatically, as well as the string in the PETSC_OPTIONS > environment variable. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Mon Dec 13 00:43:26 2010 From: jed at 59A2.org (Jed Brown) Date: Sun, 12 Dec 2010 22:43:26 -0800 Subject: [petsc-users] About the "PetscOptionsSetValue" usage In-Reply-To: References: Message-ID: On Sun, Dec 12, 2010 at 22:35, Filippo Spiga wrote: > What can I do to remove the option "-c" inside the internal PETSc option > database? > PetscOptionsClearValue() > Is there a routine that reset/empty completely the internal PETSc database > option without terminate the program? > PetscOptionsClear() > Or the only way is to assign a different prefix for every program > iteration? > What do you have to change differently on each iteration, but want to read from a config file? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.pola at gmail.com Mon Dec 13 01:29:16 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Mon, 13 Dec 2010 08:29:16 +0100 Subject: [petsc-users] MatMult In-Reply-To: <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> References: <1292077938.2074.38.camel@desktop> <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> Message-ID: <1292225356.1803.4.camel@desktop> Hi, Does MatMult function is performed on GPU? when I prepared program which just executes this function with parameters -vec_type cuda and -mat_type seqaijcuda i havent seen in summary log any VecCUDACopyTo entry Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze: > To answer this you need to understand that PETSc copies vectors and matrices to the GPU memory "on demand" (that is exactly when they are first needed on the GPU, and not before) and once it has copied to the GPU it keeps track of it and will NOT copy it down again if it is already there. > > Hence in your run below, yes it includes the copy time down. > > But note that ONE multiply on the GPU is absurd, it does not make sense to copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do "sandalone" benchmarking where a single kernel is called by it self once, the time results are useless. Always run a FULL application with -log_summary; for example in this case a full KSPSolve() that requires a bunch of iterations. Then you can look at the performance of each kernel. The reason to do it this way is that the numbers can be very different and what matters is runs in APPLICATIONS so that is what should be measured. > > If say you run KSP with 20 iterations then the time to copy the matrix down to the GPU is amortized over those 20 iterations and thus maybe ok. You should see the flop rate for the MatMult() go up in this case. > > You may have noticed we have a log entry for VecCopyToGPU() we will be adding one for matrices as well thus you will be able to see how long the copy time is but not that the copy time is still counted in the MatMult() time if the first copy of the matrix to GPU is triggered by the MatMult. You can subtract the copy time from the mult time to get the per multiply time, this would correspond to the multiply time in the limit of a single copy down and many, many multiplies on the GPU. > > Barry > > > > > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: > > > Hello again, > > > > I compiled one of te examples. I used sparse matix called 02-raefsky3. > > I used -vec_type cuda and -mat_type seqaijcuda. > > > > When I see summary of the operations performed by program there is > > > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100 > > 0 0 0 2100 0 0 0 147 > > > > Does time of performing MatMult includes memory transfer for loading > > matrix in GPU memory or just exact computation time? > > > > Thanks in advance. > > Kuba. > > > From knepley at gmail.com Mon Dec 13 01:37:00 2010 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 Dec 2010 07:37:00 +0000 Subject: [petsc-users] MatMult In-Reply-To: <1292225356.1803.4.camel@desktop> References: <1292077938.2074.38.camel@desktop> <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> <1292225356.1803.4.camel@desktop> Message-ID: Yes, it should run on the GPU. Check an example, like ex19. Matt On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola wrote: > Hi, > > Does MatMult function is performed on GPU? when I prepared program which > just executes this function with parameters -vec_type cuda and -mat_type > seqaijcuda i havent seen in summary log any VecCUDACopyTo entry > > > Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze: > > To answer this you need to understand that PETSc copies vectors and > matrices to the GPU memory "on demand" (that is exactly when they are first > needed on the GPU, and not before) and once it has copied to the GPU it > keeps track of it and will NOT copy it down again if it is already there. > > > > Hence in your run below, yes it includes the copy time down. > > > > But note that ONE multiply on the GPU is absurd, it does not make > sense to copy a matrix down to the GPU and then do ONE multiply with it. > Thus I NEVER do "sandalone" benchmarking where a single kernel is called by > it self once, the time results are useless. Always run a FULL application > with -log_summary; for example in this case a full KSPSolve() that requires > a bunch of iterations. Then you can look at the performance of each kernel. > The reason to do it this way is that the numbers can be very different and > what matters is runs in APPLICATIONS so that is what should be measured. > > > > If say you run KSP with 20 iterations then the time to copy the matrix > down to the GPU is amortized over those 20 iterations and thus maybe ok. You > should see the flop rate for the MatMult() go up in this case. > > > > You may have noticed we have a log entry for VecCopyToGPU() we will be > adding one for matrices as well thus you will be able to see how long the > copy time is but not that the copy time is still counted in the MatMult() > time if the first copy of the matrix to GPU is triggered by the MatMult. You > can subtract the copy time from the mult time to get the per multiply time, > this would correspond to the multiply time in the limit of a single copy > down and many, many multiplies on the GPU. > > > > Barry > > > > > > > > > > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: > > > > > Hello again, > > > > > > I compiled one of te examples. I used sparse matix called 02-raefsky3. > > > I used -vec_type cuda and -mat_type seqaijcuda. > > > > > > When I see summary of the operations performed by program there is > > > > > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100 > > > 0 0 0 2100 0 0 0 147 > > > > > > Does time of performing MatMult includes memory transfer for loading > > > matrix in GPU memory or just exact computation time? > > > > > > Thanks in advance. > > > Kuba. > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.pola at gmail.com Mon Dec 13 02:20:57 2010 From: jakub.pola at gmail.com (Jakub Pola) Date: Mon, 13 Dec 2010 09:20:57 +0100 Subject: [petsc-users] MatMult In-Reply-To: References: <1292077938.2074.38.camel@desktop> <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> <1292225356.1803.4.camel@desktop> Message-ID: <1292228457.1803.34.camel@desktop> Could you please check the file attached to this email. there is source code and log summary from execution of mat mult. When I run the ex131 with parameters -vec_type cuda and -mat_type seqaijcuda mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda -vec_type cuda -log_summary it fails because of CUDA Error 4. see MatMultKO.log When I run the same program without -vec_type cuda parameter only with -mat_type seqaijcuda it run ok. mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda -log_summary MatMltOK.log When I run without -math_type seqaijcuda only with -vec_type cuda it fails again because terminate called after throwing an instance of 'thrust::system::system_error' what(): invalid argument terminate called after throwing an instance of 'thrust::system::system_error' what(): invalid argument -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 3755 on node desktop exited on signal 6 (Aborted). -------------------------------------------------------------------------- Could you please give me some comments on that Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze: > Yes, it should run on the GPU. Check an example, like ex19. > > > Matt > > On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola > wrote: > Hi, > > Does MatMult function is performed on GPU? when I prepared > program which > just executes this function with parameters -vec_type cuda and > -mat_type > seqaijcuda i havent seen in summary log any VecCUDACopyTo > entry > > > Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith > pisze: > > > > To answer this you need to understand that PETSc copies > vectors and matrices to the GPU memory "on demand" (that is > exactly when they are first needed on the GPU, and not before) > and once it has copied to the GPU it keeps track of it and > will NOT copy it down again if it is already there. > > > > Hence in your run below, yes it includes the copy time > down. > > > > But note that ONE multiply on the GPU is absurd, it does > not make sense to copy a matrix down to the GPU and then do > ONE multiply with it. Thus I NEVER do "sandalone" benchmarking > where a single kernel is called by it self once, the time > results are useless. Always run a FULL application with > -log_summary; for example in this case a full KSPSolve() that > requires a bunch of iterations. Then you can look at the > performance of each kernel. The reason to do it this way is > that the numbers can be very different and what matters is > runs in APPLICATIONS so that is what should be measured. > > > > If say you run KSP with 20 iterations then the time to > copy the matrix down to the GPU is amortized over those 20 > iterations and thus maybe ok. You should see the flop rate for > the MatMult() go up in this case. > > > > You may have noticed we have a log entry for > VecCopyToGPU() we will be adding one for matrices as well thus > you will be able to see how long the copy time is but not that > the copy time is still counted in the MatMult() time if the > first copy of the matrix to GPU is triggered by the MatMult. > You can subtract the copy time from the mult time to get the > per multiply time, this would correspond to the multiply time > in the limit of a single copy down and many, many multiplies > on the GPU. > > > > Barry > > > > > > > > > > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: > > > > > Hello again, > > > > > > I compiled one of te examples. I used sparse matix called > 02-raefsky3. > > > I used -vec_type cuda and -mat_type seqaijcuda. > > > > > > When I see summary of the operations performed by program > there is > > > > > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 2100 > > > 0 0 0 2100 0 0 0 147 > > > > > > Does time of performing MatMult includes memory transfer > for loading > > > matrix in GPU memory or just exact computation time? > > > > > > Thanks in advance. > > > Kuba. > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > -------------- next part -------------- A non-text attachment was scrubbed... Name: tests.zip Type: application/zip Size: 4031 bytes Desc: not available URL: From maxwindiff at gmail.com Mon Dec 13 04:30:11 2010 From: maxwindiff at gmail.com (Max Ng) Date: Mon, 13 Dec 2010 18:30:11 +0800 Subject: [petsc-users] run direct linear solver in parallel Message-ID: Hi, I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use SPOOLES because I need to build on Windows with VC++ (and without a Fortran compiler). And in my tests somehow SPOOLES performs better than SuperLU. My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I got this error: Assertion failed in file helper_fns.c at line 337: 0 memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 len_=4 internal ABORT - process 1 Assertion failed in file helper_fns.c at line 337: 0 memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 len_=4 internal ABORT - process 0 rank 1 in job 113 vm1_57881 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Here is the source code: // N = 40000, n = 20000, nnz = 9 // MatCreate(comm, &mat); MatSetType(mat, MATAIJ); MatSetSizes(mat, n, n, N, N); MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL); MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz, PETSC_NULL); // some code to fill the matrix values // ... KSPCreate(comm, &ksp); KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN); KSPSetType(ksp, KSPPREONLY); KSPGetPC(ksp, &pc); PCSetType(pc, PCLU); PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES); KSPSetUp(ksp); It crashes at the KSPSetUp() statement. Do you have any ideas? Thanks in advance! Max Ng On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote: > Hi everyone,> > I am wondering how I can run the direct solver in parallel. I can run> my program in a single processor with direct linear solver by> > ./foo.out -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles> > However, when I try to run it with mpi:> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu> -pc_factor_mat_solver_package spooles> > I got error like this:> > [0]PETSC ERROR: --------------------- Error Message> ------------------------------------> [0]PETSC ERROR: No support for this operation for this object type!> [0]PETSC ERROR: Matrix type mpiaij symbolic LU!> > [0]PETSC ERROR: Libraries linked from> /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt> [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008> [0]PETSC ERROR: Configure options --with-shared --with-dynamic> --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi> --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack> --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse> --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"> --with-superlu=1 --with-superlu-include=/usr/include/superlu> --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1> --with-spooles-include=/usr/include/spooles> --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1> --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr> [0]PETSC ERROR:> ------------------------------------------------------------------------> [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c> [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c> -------------------------------------------------------> > Would you like to tell me where I am doing wrong? I appreciate your help.> > Xiangdong -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Dec 13 08:34:56 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 13 Dec 2010 08:34:56 -0600 Subject: [petsc-users] run direct linear solver in parallel In-Reply-To: References: Message-ID: <3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov> The problem is not in PETSc. Run in the debugger and see exactly where this memcpy() overlap happens and if it can be fixed. Barry On Dec 13, 2010, at 4:30 AM, Max Ng wrote: > Hi, > > I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use SPOOLES because I need to build on Windows with VC++ (and without a Fortran compiler). And in my tests somehow SPOOLES performs better than SuperLU. > > My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I got this error: > > Assertion failed in file helper_fns.c at line 337: 0 > memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 len_=4 > > internal ABORT - process 1 > Assertion failed in file helper_fns.c at line 337: 0 > memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 len_=4 > > internal ABORT - process 0 > rank 1 in job 113 vm1_57881 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > > Here is the source code: > > // N = 40000, n = 20000, nnz = 9 > // > MatCreate(comm, &mat); > MatSetType(mat, MATAIJ); > MatSetSizes(mat, n, n, N, N); > MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL); > MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz, PETSC_NULL); > > // some code to fill the matrix values > // ... > > KSPCreate(comm, &ksp); > KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN); > KSPSetType(ksp, KSPPREONLY); > > KSPGetPC(ksp, &pc); > PCSetType(pc, PCLU); > PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES); > > KSPSetUp(ksp); > > It crashes at the KSPSetUp() statement. > > Do you have any ideas? Thanks in advance! > > Max Ng > > On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote: > >> > Hi everyone, >> > >> > I am wondering how I can run the direct solver in parallel. I can run >> > my program in a single processor with direct linear solver by >> > >> > ./foo.out -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles >> > >> > However, when I try to run it with mpi: >> > >> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu >> > -pc_factor_mat_solver_package spooles >> > >> > I got error like this: >> > >> > [0]PETSC ERROR: --------------------- Error Message >> > ------------------------------------ >> > [0]PETSC ERROR: No support for this operation for this object type! >> > [0]PETSC ERROR: Matrix type mpiaij symbolic LU! >> > >> > [0]PETSC ERROR: Libraries linked from >> > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt >> > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008 >> > [0]PETSC ERROR: Configure options --with-shared --with-dynamic >> > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi >> > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack >> > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse >> > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" >> > --with-superlu=1 --with-superlu-include=/usr/include/superlu >> > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1 >> > --with-spooles-include=/usr/include/spooles >> > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 >> > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr >> > [0]PETSC ERROR: >> > ------------------------------------------------------------------------ >> > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c >> > [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c >> > ------------------------------------------------------- >> > >> > Would you like to tell me where I am doing wrong? I appreciate your help. >> > >> > Xiangdong > From hzhang at mcs.anl.gov Mon Dec 13 09:00:51 2010 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 13 Dec 2010 09:00:51 -0600 Subject: [petsc-users] run direct linear solver in parallel In-Reply-To: <3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov> References: <3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov> Message-ID: Max, Does superlu_dist crash? Spooles has been out of support from its developers for more than 10 years. For small testing problems, it can be faster. Mumps is a good and robust direct solver we usually recommend, but it requires f90. Hong On Mon, Dec 13, 2010 at 8:34 AM, Barry Smith wrote: > > ? The problem is not in PETSc. ? ?Run in the debugger ?and see exactly where this memcpy() overlap happens and if it can be fixed. > > ?Barry > > > On Dec 13, 2010, at 4:30 AM, Max Ng wrote: > >> Hi, >> >> I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use SPOOLES because I need to build on Windows with VC++ (and without a Fortran compiler). And in my tests somehow SPOOLES performs better than SuperLU. >> >> My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I got this error: >> >> Assertion failed in file helper_fns.c at line 337: 0 >> memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 len_=4 >> >> internal ABORT - process 1 >> Assertion failed in file helper_fns.c at line 337: 0 >> memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 len_=4 >> >> internal ABORT - process 0 >> rank 1 in job 113 ?vm1_57881 ? caused collective abort of all ranks >> ? exit status of rank 1: killed by signal 9 >> >> Here is the source code: >> >> ? ? ? ? ? ? // N = 40000, n = 20000, nnz = 9 >> ? ? ? ? ? ? // >> ? ? ? ? ? ? MatCreate(comm, &mat); >> ? ? ? ? ? ? MatSetType(mat, MATAIJ); >> ? ? ? ? ? ? MatSetSizes(mat, n, n, N, N); >> ? ? ? ? ? ? MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL); >> ? ? ? ? ? ? MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz, PETSC_NULL); >> >> ? ? ? ? ? ? // some code to fill the matrix values >> ? ? ? ? ? ? // ... >> >> ? ? ? ? ? ? KSPCreate(comm, &ksp); >> ? ? ? ? ? ? KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN); >> ? ? ? ? ? ? KSPSetType(ksp, KSPPREONLY); >> >> ? ? ? ? ? ? KSPGetPC(ksp, &pc); >> ? ? ? ? ? ? PCSetType(pc, PCLU); >> ? ? ? ? ? ? PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES); >> >> ? ? ? ? ? ? KSPSetUp(ksp); >> >> It crashes at the KSPSetUp() statement. >> >> Do you have any ideas? Thanks in advance! >> >> Max Ng >> >> On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote: >> >>> > Hi everyone, >>> > >>> > I am wondering how I can run the direct solver in parallel. I can run >>> > my program in a single processor with direct linear solver by >>> > >>> > ./foo.out ?-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles >>> > >>> > However, when I try to run it with mpi: >>> > >>> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu >>> > -pc_factor_mat_solver_package spooles >>> > >>> > I got error like this: >>> > >>> > [0]PETSC ERROR: --------------------- Error Message >>> > ------------------------------------ >>> > [0]PETSC ERROR: No support for this operation for this object type! >>> > [0]PETSC ERROR: Matrix type mpiaij ?symbolic LU! >>> > >>> > [0]PETSC ERROR: Libraries linked from >>> > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt >>> > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008 >>> > [0]PETSC ERROR: Configure options --with-shared --with-dynamic >>> > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi >>> > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack >>> > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse >>> > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" >>> > --with-superlu=1 --with-superlu-include=/usr/include/superlu >>> > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1 >>> > --with-spooles-include=/usr/include/spooles >>> > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 >>> > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr >>> > [0]PETSC ERROR: >>> > ------------------------------------------------------------------------ >>> > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in src/mat/interface/matrix.c >>> > [0]PETSC ERROR: PCSetUp_LU() line 257 in src/ksp/pc/impls/factor/lu/lu.c >>> > ------------------------------------------------------- >>> > >>> > Would you like to tell me where I am doing wrong? I appreciate your help. >>> > >>> > Xiangdong >> > > From u.tabak at tudelft.nl Mon Dec 13 09:11:41 2010 From: u.tabak at tudelft.nl (Umut Tabak) Date: Mon, 13 Dec 2010 16:11:41 +0100 Subject: [petsc-users] class templates in C++ and Petsc functions Message-ID: <4D0637AD.6060206@tudelft.nl> Dear all, I was trying to write a simple class template code to interface Petsc matrices along with Boost sparse matrices and vice versa. However, my first naive try did not work out of the box, since the template class requires some functions from PETSc libraries and these functions should be located in the object files to be able to compile. Since class template is header only and there is no object code generation, it can not find the PETSc library functions which is logical. I am also new to templates in C++. I just gave a try to create a template class instead of code duplication however it did not end up as expected. Are there some smarter ways to accomplish this task. The function analyzes a boost csr matrix and sets the rows of a Petsc matrix. T is for boost matrices and T1 for Mat in Petsc. I can also get some comments on the code ;) Best regards, Umut template int converter::convertMBo2Pe( const T& A, T1 A_ ){ PetscErrorCode ierr; int cntNnz = 0; typedef typename T::iterator1 i1_t; typedef typename T::iterator2 i2_t; //int nnz[ooFelieMatrix.size1()]; int nnz[A.size1()]; unsigned ind=0; //get information about the matrix double* vals = NULL; for (i1_t i1 = A.begin1(); i1 != A.end1(); ++i1) { nnz[ind] = distance(i1.begin(), i1.end()); ind++; } // create the matrix depending // on the values of the nonzeros // on each row ierr = MatCreateSeqAIJ( PETSC_COMM_SELF, A.size1(), A.size2(), cntNnz, nnz, A_ ); PetscInt rInd = 0, cInd=0; PetscInt* rCount, dummy; rCount = &dummy; // pointer to values in a row PetscScalar* valsOfRowI = NULL; PetscInt* colIndexOfRowI = NULL; PetscInt rC = 1; for(i1_t i1 = A.begin1(); i1 != A.end1(); ++i1) { // allocate space for the values of row I valsOfRowI = new PetscScalar[nnz[rInd]]; colIndexOfRowI = new PetscInt[nnz[rInd]]; for(i2_t i2 = i1.begin(); i2 != i1.end(); ++i2) { colIndexOfRowI[cInd] = i2.index2(); valsOfRowI[cInd] = *i2; cInd++; } // setting one row each time *rCount = rInd; MatSetValues( A_, rC, rCount, nnz[rInd], colIndexOfRowI, valsOfRowI, INSERT_VALUES ); // delete delete [] valsOfRowI; delete [] colIndexOfRowI; rInd++; cInd = 0; } // MatAssemblyBegin( A_, MAT_FINAL_ASSEMBLY ); MatAssemblyEnd( A_, MAT_FINAL_ASSEMBLY ); // return return 0; } -- - Hope is a good thing, maybe the best of things and no good thing ever dies... The Shawshank Redemption, replique of Tim Robbins From maxwindiff at gmail.com Mon Dec 13 09:12:46 2010 From: maxwindiff at gmail.com (Max Ng) Date: Mon, 13 Dec 2010 23:12:46 +0800 Subject: [petsc-users] run direct linear solver in parallel In-Reply-To: References: <3B74C042-79A7-4139-A7B0-1C7F0EF02EDC@mcs.anl.gov> Message-ID: Hi, The error seems to be trapped by MPICH2's assertions. Is there some way to propagate them to debuggers (gdb, whatever)? Yep, I think I'll try SuperLU_dist again then. Thanks for your advices! Max On Mon, Dec 13, 2010 at 11:00 PM, Hong Zhang wrote: > Max, > Does superlu_dist crash? > Spooles has been out of support from its developers for more than 10 years. > For small testing problems, it can be faster. > > Mumps is a good and robust direct solver we usually recommend, but it > requires f90. > > Hong > > On Mon, Dec 13, 2010 at 8:34 AM, Barry Smith wrote: > > > > The problem is not in PETSc. Run in the debugger and see exactly > where this memcpy() overlap happens and if it can be fixed. > > > > Barry > > > > > > On Dec 13, 2010, at 4:30 AM, Max Ng wrote: > > > >> Hi, > >> > >> I am having a similar problem and I'm using PETSc 3.1-p6. I wish to use > SPOOLES because I need to build on Windows with VC++ (and without a Fortran > compiler). And in my tests somehow SPOOLES performs better than SuperLU. > >> > >> My program runs correctly in mpiexec -n 1. When I try mpiexec -n 2, I > got this error: > >> > >> Assertion failed in file helper_fns.c at line 337: 0 > >> memcpy argument memory ranges overlap, dst_=0x972ef84 src_=0x972ef84 > len_=4 > >> > >> internal ABORT - process 1 > >> Assertion failed in file helper_fns.c at line 337: 0 > >> memcpy argument memory ranges overlap, dst_=0x90c4018 src_=0x90c4018 > len_=4 > >> > >> internal ABORT - process 0 > >> rank 1 in job 113 vm1_57881 caused collective abort of all ranks > >> exit status of rank 1: killed by signal 9 > >> > >> Here is the source code: > >> > >> // N = 40000, n = 20000, nnz = 9 > >> // > >> MatCreate(comm, &mat); > >> MatSetType(mat, MATAIJ); > >> MatSetSizes(mat, n, n, N, N); > >> MatSeqAIJSetPreallocation(mat, nnz, PETSC_NULL); > >> MatMPIAIJSetPreallocation(mat, nnz, PETSC_NULL, nnz, > PETSC_NULL); > >> > >> // some code to fill the matrix values > >> // ... > >> > >> KSPCreate(comm, &ksp); > >> KSPSetOperators(ksp, mat, mat, DIFFERENT_NONZERO_PATTERN); > >> KSPSetType(ksp, KSPPREONLY); > >> > >> KSPGetPC(ksp, &pc); > >> PCSetType(pc, PCLU); > >> PCFactorSetMatSolverPackage(pc, MAT_SOLVER_SPOOLES); > >> > >> KSPSetUp(ksp); > >> > >> It crashes at the KSPSetUp() statement. > >> > >> Do you have any ideas? Thanks in advance! > >> > >> Max Ng > >> > >> On Dec 3, 2010, at 4:19 PM, Xiangdong Liang wrote: > >> > >>> > Hi everyone, > >>> > > >>> > I am wondering how I can run the direct solver in parallel. I can run > >>> > my program in a single processor with direct linear solver by > >>> > > >>> > ./foo.out -ksp_type preonly -pc_type lu > -pc_factor_mat_solver_package spooles > >>> > > >>> > However, when I try to run it with mpi: > >>> > > >>> > mpirun.openmpi -np 2 ./foo.out -ksp_type preonly -pc_type lu > >>> > -pc_factor_mat_solver_package spooles > >>> > > >>> > I got error like this: > >>> > > >>> > [0]PETSC ERROR: --------------------- Error Message > >>> > ------------------------------------ > >>> > [0]PETSC ERROR: No support for this operation for this object type! > >>> > [0]PETSC ERROR: Matrix type mpiaij symbolic LU! > >>> > > >>> > [0]PETSC ERROR: Libraries linked from > >>> > /home/hazelsct/petsc-2.3.3/lib/linux-gnu-c-opt > >>> > [0]PETSC ERROR: Configure run at Mon Jun 30 14:37:52 2008 > >>> > [0]PETSC ERROR: Configure options --with-shared --with-dynamic > >>> > --with-debugging=0 --useThreads 0 --with-mpi-dir=/usr/lib/openmpi > >>> > --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack > >>> > --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse > >>> > --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" > >>> > --with-superlu=1 --with-superlu-include=/usr/include/superlu > >>> > --with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1 > >>> > --with-spooles-include=/usr/include/spooles > >>> > --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 > >>> > --with-hypre-dir=/usr --with-babel=1 --with-babel-dir=/usr > >>> > [0]PETSC ERROR: > >>> > > ------------------------------------------------------------------------ > >>> > [0]PETSC ERROR: MatLUFactorSymbolic() line 2174 in > src/mat/interface/matrix.c > >>> > [0]PETSC ERROR: PCSetUp_LU() line 257 in > src/ksp/pc/impls/factor/lu/lu.c > >>> > ------------------------------------------------------- > >>> > > >>> > Would you like to tell me where I am doing wrong? I appreciate your > help. > >>> > > >>> > Xiangdong > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.skates82 at gmail.com Mon Dec 13 11:10:25 2010 From: m.skates82 at gmail.com (Nunion) Date: Mon, 13 Dec 2010 11:10:25 -0600 Subject: [petsc-users] Writing PETSc matrices In-Reply-To: <22CBB298-B12C-4583-A516-5DCB74576EDF@mcs.anl.gov> References: <22CBB298-B12C-4583-A516-5DCB74576EDF@mcs.anl.gov> Message-ID: What files should one use to convert vectors from Matlab to PETSc (PetscBinaryWrite is for square matrices)? I have attempted to write directly to binary from Matlab a matrix + vector, and only a vector (using save command with the -mat option), then read the binary file into PETSc (using ex34.c in ...src/mat/tests), however the format is not recognized. Thanks, Tom On Tue, Oct 26, 2010 at 3:57 PM, Barry Smith wrote: > > Use PetscBinaryWrite('filename',sparsematlabmatrix) I do not know why > your second argument has quotes around it. > > Barry > > > On Oct 26, 2010, at 3:33 PM, Nunion wrote: > > > Hello, > > > > I am new to PETSc and programming. I have a question concerning writing > PETSc matrices in binary from binary matrices [compressessed/uncompressed] > generated in Matlab. I am attempting to use the files in the /bin/matlab > directory, in particular the PetscBinaryWrite.m file. However, the usage; > > > > PetscBinaryWrite('matrix.mat','output.ex') does not seem to work. I also > tried using the examples in the /mat directory however, matlab does not > support the writing of complex matrices in ASCII. > > > > Thanks in advance, > > > > Tom > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.tabak at tudelft.nl Mon Dec 13 11:20:40 2010 From: u.tabak at tudelft.nl (Umut Tabak) Date: Mon, 13 Dec 2010 18:20:40 +0100 Subject: [petsc-users] Writing PETSc matrices In-Reply-To: References: <22CBB298-B12C-4583-A516-5DCB74576EDF@mcs.anl.gov> Message-ID: <4D0655E8.2030800@tudelft.nl> On 12/13/2010 06:10 PM, Nunion wrote: > What files should one use to convert vectors from Matlab to PETSc > (PetscBinaryWrite is for square matrices)? I have attempted to write > directly to binary from Matlab a matrix + vector, and only a vector > (using save command with the -mat option), then read the binary file > into PETSc (using ex34.c in ...src/mat/tests), however the format is > not recognized. > > Thanks, > Tom > Trying to read mat files in Petsc? This is not possible AFAIK. Use the provided Matlab interface to write objects in Petsc Binary format. You should be sure that the matrices are in sparse format. Also for vectors I guess, I am not sure you should check. Then you can use these binary files generated in MATLAB in PETSc without problems. HTH, U. From mhender at us.ibm.com Mon Dec 13 11:56:15 2010 From: mhender at us.ibm.com (Michael E Henderson) Date: Mon, 13 Dec 2010 12:56:15 -0500 Subject: [petsc-users] Solution output of TS Message-ID: Hi, I'm using TS and seeing a ":flickering" in the output. I use ierr=TSCreate(PETSC_COMM_WORLD,&timeStepper); ierr=TSSetProblemType(timeStepper,TS_NONLINEAR); ierr=TSSetFromOptions(timeStepper); ierr=TSSetIFunction(timeStepper, formOperatorImplicitTimeFunction , (void*)Ldata); ierr=TSSetIJacobian(timeStepper,A,A, formOperatorImplicitTimeJacobian , (void*)Ldata); ierr=TSMonitorSet(timeStepper,SaveTSSolution,(void*)(&data),NULL); I'm using the implicit nonlinear formulation for an ADE, and writing the solution in the monitoring routine. If I instead write out the solution in the IFunction routine when the residual is small I see a different (better) solution passed in. Does this sound like a problem that's been seen before? Thanks, MIke Henderson ------------------------------------------------------------------------------------------------------------------------------------ Mathematical Sciences, TJ Watson Research Center mhender at watson.ibm.com http://www.research.ibm.com/people/h/henderson/ http://multifario.sourceforge.net/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Dec 13 12:01:48 2010 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 Dec 2010 18:01:48 +0000 Subject: [petsc-users] class templates in C++ and Petsc functions In-Reply-To: <4D0637AD.6060206@tudelft.nl> References: <4D0637AD.6060206@tudelft.nl> Message-ID: What is the error? Matt On Mon, Dec 13, 2010 at 3:11 PM, Umut Tabak wrote: > Dear all, > > I was trying to write a simple class template code to interface Petsc > matrices along with Boost sparse matrices and vice versa. However, my first > naive try did not work out of the box, since the template class requires > some functions from PETSc libraries and these functions should be located in > the object files to be able to compile. Since class template is header only > and there is no object code generation, it can not find the PETSc library > functions which is logical. I am also new to templates in C++. I just gave a > try to create a template class instead of code duplication however it did > not end up as expected. Are there some smarter ways to accomplish this task. > The function analyzes a boost csr matrix and sets the rows of a Petsc > matrix. T is for boost matrices > and T1 for Mat in Petsc. I can also get some comments on the code ;) > > Best regards, > Umut > > template > int converter::convertMBo2Pe( const T& A, > T1 A_ ){ > PetscErrorCode ierr; > int cntNnz = 0; > typedef typename T::iterator1 i1_t; > typedef typename T::iterator2 i2_t; > //int nnz[ooFelieMatrix.size1()]; > int nnz[A.size1()]; > unsigned ind=0; > //get information about the matrix > > double* vals = NULL; > for (i1_t i1 = A.begin1(); i1 != A.end1(); ++i1) > { > nnz[ind] = distance(i1.begin(), i1.end()); > ind++; > } > // create the matrix depending > // on the values of the nonzeros > // on each row > ierr = MatCreateSeqAIJ( PETSC_COMM_SELF, A.size1(), > A.size2(), cntNnz, nnz, > A_ ); > PetscInt rInd = 0, cInd=0; > PetscInt* rCount, dummy; > rCount = &dummy; > // pointer to values in a row > PetscScalar* valsOfRowI = NULL; > PetscInt* colIndexOfRowI = NULL; > PetscInt rC = 1; > for(i1_t i1 = A.begin1(); i1 != A.end1(); ++i1) > { > // allocate space for the values of row I > valsOfRowI = new PetscScalar[nnz[rInd]]; > colIndexOfRowI = new PetscInt[nnz[rInd]]; > for(i2_t i2 = i1.begin(); i2 != i1.end(); ++i2) > { > colIndexOfRowI[cInd] = i2.index2(); > valsOfRowI[cInd] = *i2; > cInd++; > } > // setting one row each time > *rCount = rInd; > MatSetValues( A_, rC, rCount, nnz[rInd], > colIndexOfRowI, valsOfRowI, > INSERT_VALUES ); > // delete > delete [] valsOfRowI; > delete [] colIndexOfRowI; > rInd++; cInd = 0; > } > // > MatAssemblyBegin( A_, MAT_FINAL_ASSEMBLY ); > MatAssemblyEnd( A_, MAT_FINAL_ASSEMBLY ); > // return > return 0; > } > > -- > - Hope is a good thing, maybe the best of things > and no good thing ever dies... > The Shawshank Redemption, replique of Tim Robbins > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Dec 13 21:01:37 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 13 Dec 2010 21:01:37 -0600 Subject: [petsc-users] MatMult In-Reply-To: <1292228457.1803.34.camel@desktop> References: <1292077938.2074.38.camel@desktop> <05F277AE-DA79-4D81-A573-E74C53DE58B3@mcs.anl.gov> <1292225356.1803.4.camel@desktop> <1292228457.1803.34.camel@desktop> Message-ID: <681B1F8F-BB5F-4EBF-831E-1227218ED3D0@mcs.anl.gov> Runs ok for me. Barry On Dec 13, 2010, at 2:20 AM, Jakub Pola wrote: > Could you please check the file attached to this email. there is source > code and log summary from execution of mat mult. > > When I run the ex131 with parameters -vec_type cuda and -mat_type > seqaijcuda > > mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda > -vec_type cuda -log_summary > > it fails because of CUDA Error 4. see MatMultKO.log > > > When I run the same program without -vec_type cuda parameter only with > -mat_type seqaijcuda it run ok. > mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda > -log_summary > > MatMltOK.log > > When I run without -math_type seqaijcuda only with -vec_type cuda it > fails again because > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): invalid argument > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): invalid argument > -------------------------------------------------------------------------- > mpiexec noticed that process rank 0 with PID 3755 on node desktop exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > > Could you please give me some comments on that > > Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze: >> Yes, it should run on the GPU. Check an example, like ex19. >> >> >> Matt >> >> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola >> wrote: >> Hi, >> >> Does MatMult function is performed on GPU? when I prepared >> program which >> just executes this function with parameters -vec_type cuda and >> -mat_type >> seqaijcuda i havent seen in summary log any VecCUDACopyTo >> entry >> >> >> Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith >> pisze: >> >> >>> To answer this you need to understand that PETSc copies >> vectors and matrices to the GPU memory "on demand" (that is >> exactly when they are first needed on the GPU, and not before) >> and once it has copied to the GPU it keeps track of it and >> will NOT copy it down again if it is already there. >>> >>> Hence in your run below, yes it includes the copy time >> down. >>> >>> But note that ONE multiply on the GPU is absurd, it does >> not make sense to copy a matrix down to the GPU and then do >> ONE multiply with it. Thus I NEVER do "sandalone" benchmarking >> where a single kernel is called by it self once, the time >> results are useless. Always run a FULL application with >> -log_summary; for example in this case a full KSPSolve() that >> requires a bunch of iterations. Then you can look at the >> performance of each kernel. The reason to do it this way is >> that the numbers can be very different and what matters is >> runs in APPLICATIONS so that is what should be measured. >>> >>> If say you run KSP with 20 iterations then the time to >> copy the matrix down to the GPU is amortized over those 20 >> iterations and thus maybe ok. You should see the flop rate for >> the MatMult() go up in this case. >>> >>> You may have noticed we have a log entry for >> VecCopyToGPU() we will be adding one for matrices as well thus >> you will be able to see how long the copy time is but not that >> the copy time is still counted in the MatMult() time if the >> first copy of the matrix to GPU is triggered by the MatMult. >> You can subtract the copy time from the mult time to get the >> per multiply time, this would correspond to the multiply time >> in the limit of a single copy down and many, many multiplies >> on the GPU. >>> >>> Barry >>> >>> >>> >>> >>> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: >>> >>>> Hello again, >>>> >>>> I compiled one of te examples. I used sparse matix called >> 02-raefsky3. >>>> I used -vec_type cuda and -mat_type seqaijcuda. >>>> >>>> When I see summary of the operations performed by program >> there is >>>> >>>> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2100 >>>> 0 0 0 2100 0 0 0 147 >>>> >>>> Does time of performing MatMult includes memory transfer >> for loading >>>> matrix in GPU memory or just exact computation time? >>>> >>>> Thanks in advance. >>>> Kuba. >>>> >>> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> > > From mmnasr at gmail.com Mon Dec 13 22:42:31 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Mon, 13 Dec 2010 20:42:31 -0800 Subject: [petsc-users] KSP solver and Distributed arrays Message-ID: Hi guys, A simple question. Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix and rhs, solution vectors with different number of ghost nodes (width)? As an example, can it be possible to solve for vector {x} DA, width=3, STAR stencil, [A] DA, width=1, STAR stencil {b} DA, width=1, STAR stencil I am suspecting that it is not possible. Thanks, Mohamad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmnasr at gmail.com Tue Dec 14 04:15:21 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Tue, 14 Dec 2010 02:15:21 -0800 Subject: [petsc-users] Updating the ghost nodes for distributed arrays Message-ID: Hi guys, Is it possible to update the ghost values from a global to a local vector for distributed arrays when global and local vectors are not from the same DA, but the global vectors are the same? This is the the code that I have, (the only difference between the two DA's is the width. So, I am assuming that any global vector created based on those are going to be the same) G_data is created based on DA_3D, whereas L_data2 is created based on DA_3D2. Vec G_data, L_data; Vec G_data2, L_data2; ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width, PETSC_NULL, PETSC_NULL, PETSC_NULL, &DA_3D); ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width+2, PETSC_NULL, PETSC_NULL, PETSC_NULL, &DA_3D2); ierr = DACreateGlobalVector(DA_3D, &G_data); CHKERRQ(ierr); ierr = DACreateLocalVector(DA_3D, &L_data); CHKERRQ(ierr); ierr = DACreateGlobalVector(DA_3D2, &G_data2); CHKERRQ(ierr); ierr = DACreateLocalVector(DA_3D2, &L_data2); CHKERRQ(ierr); /* =====> Is this possible? */ ierr = DAGlobalToLocalBegin(DA_3D2, G_data, INSERT_VALUES, L_data2);CHKERRQ(ierr); ierr = DAGlobalToLocalEnd(DA_3D2, G_data, INSERT_VALUES, L_data2);CHKERRQ(ierr); Thanks, Mohamad -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Dec 14 08:11:09 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 14 Dec 2010 08:11:09 -0600 Subject: [petsc-users] KSP solver and Distributed arrays In-Reply-To: References: Message-ID: The linear solvers only know about global dimensions and local sizes (they know nothing about local representations) so if the vectors and matrix are compatible the answer is yes. Barry On Dec 13, 2010, at 10:42 PM, Mohamad M. Nasr-Azadani wrote: > Hi guys, > > A simple question. > Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix and rhs, solution vectors with different number of ghost nodes (width)? > As an example, can it be possible to solve for > vector {x} DA, width=3, STAR stencil, > [A] DA, width=1, STAR stencil > {b} DA, width=1, STAR stencil > > I am suspecting that it is not possible. > > Thanks, > Mohamad > From bsmith at mcs.anl.gov Tue Dec 14 08:12:16 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 14 Dec 2010 08:12:16 -0600 Subject: [petsc-users] Updating the ghost nodes for distributed arrays In-Reply-To: References: Message-ID: On Dec 14, 2010, at 4:15 AM, Mohamad M. Nasr-Azadani wrote: > Hi guys, > > Is it possible to update the ghost values from a global to a local vector for distributed arrays when global and local vectors are not from the same DA, but the global vectors are the same? Yes > This is the the code that I have, (the only difference between the two DA's is the width. So, I am assuming that any global vector created based on those are going to be the same) Yes > > G_data is created based on DA_3D, whereas L_data2 is created based on DA_3D2. > > > Vec G_data, L_data; > Vec G_data2, L_data2; > > > ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width, PETSC_NULL, PETSC_NULL, PETSC_NULL, &DA_3D); > ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width+2, PETSC_NULL, PETSC_NULL, PETSC_NULL, &DA_3D2); > > ierr = DACreateGlobalVector(DA_3D, &G_data); CHKERRQ(ierr); > ierr = DACreateLocalVector(DA_3D, &L_data); CHKERRQ(ierr); > > ierr = DACreateGlobalVector(DA_3D2, &G_data2); CHKERRQ(ierr); > ierr = DACreateLocalVector(DA_3D2, &L_data2); CHKERRQ(ierr); > > /* =====> Is this possible? */ > ierr = DAGlobalToLocalBegin(DA_3D2, G_data, INSERT_VALUES, L_data2);CHKERRQ(ierr); > ierr = DAGlobalToLocalEnd(DA_3D2, G_data, INSERT_VALUES, L_data2);CHKERRQ(ierr); > > > Thanks, > Mohamad > From knepley at gmail.com Tue Dec 14 09:14:04 2010 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Dec 2010 07:14:04 -0800 Subject: [petsc-users] KSP solver and Distributed arrays In-Reply-To: References: Message-ID: Yes that is possible. The solver does not know about ghost nodes. Matt On Mon, Dec 13, 2010 at 8:42 PM, Mohamad M. Nasr-Azadani wrote: > Hi guys, > > A simple question. > Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix and > rhs, solution vectors with different number of ghost nodes (width)? > As an example, can it be possible to solve for > vector {x} DA, width=3, STAR stencil, > [A] DA, width=1, STAR stencil > {b} DA, width=1, STAR stencil > > I am suspecting that it is not possible. > > Thanks, > Mohamad > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecoon at lanl.gov Wed Dec 15 13:09:59 2010 From: ecoon at lanl.gov (Ethan Coon) Date: Wed, 15 Dec 2010 12:09:59 -0700 Subject: [petsc-users] IS from DA by coordinates Message-ID: <1292440199.15255.38.camel@hahn.lanl.gov> Hi all, Is there a cleaner way to create an IS to a global vector on a DA for a subset of nodes using a coordinate value than the following? (Pardon my pseudo-code combo of python and c and imprecise arguments) // get the coordinates of the nodes DAGetCoordinateDA(da, dac) DAGetCoordinates(da, vecc) DAVecGetArray(dac, vecc, vecc_a) // generate a one-dof da with no ghosts and the same parallel // structure as the da DAGetOwnershipRanges(da, lx[], ly[], lz[]) DACreate3D(comm, M,N,P,len(lx),len(ly),len(lz),1,0, lx[], ly[], \ lz[], da_one) // get the global indices of the one-dof da, noting that because // we set the stencil size to zero, the local array of global // indices is the same size as the local portion of the global array // of coordinates DAGetCorners(xs, ys, zs, xl, yl, zl) DAGetGlobalIndices(da_one, xl*yl*zl, indices[]) // loop over the local array returned by // DAGetGlobalIndices and the coordinates, // comparing to the test global_block_indices = [] for i in range(xs, xs+xl): for j in range(ys, ys+yl): for k in range(zs, zs+zl): if vecc_a[k,j,i,:] == whatever_coordinate: global_block_indices.append(indices[k,j,i]) // make the IS ISCreateBlock(comm, ndofs, global_block_indices, coord_is) // restore and destroy etc. This just seems quite complicated with the construction of the one-dof da to get global indices of the block. There might be a better way with ISLocalToGlobalMapping, but I wasn't sure how. Any suggestions? Thanks, Ethan -- ------------------------------------- Ethan Coon Post-Doctoral Researcher Mathematical Modeling and Analysis Los Alamos National Laboratory 505-665-8289 http://www.ldeo.columbia.edu/~ecoon/ ------------------------------------- From bsmith at mcs.anl.gov Wed Dec 15 13:26:18 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 15 Dec 2010 13:26:18 -0600 Subject: [petsc-users] IS from DA by coordinates In-Reply-To: <1292440199.15255.38.camel@hahn.lanl.gov> References: <1292440199.15255.38.camel@hahn.lanl.gov> Message-ID: <9367973E-331F-4608-8CCE-BF23BAA9F30B@mcs.anl.gov> Ethan, I don't think there is any reason to create a new DA or call DAGetGlobalIndices().. Just call DAGetLocalToGlobalMappingBlock() on the original DA Then for i in range(xs, xs+xl): for j in range(ys, ys+yl): for k in range(zs, zs+zl): if vecc_a[k,j,i,:] == whatever_coordinate: local_indices.append( convert i,j,k, to local numbering with something like (k-zs)*mx*my + (j-ys)*mx + .. ... Now apply ISLocalToGlobalMappingApply to local_indices and you have a list of global indices depending on what you want you do with this beast you may need to scale by bs or 1/bs Barry On Dec 15, 2010, at 1:09 PM, Ethan Coon wrote: > Hi all, > > Is there a cleaner way to create an IS to a global vector on a DA for a > subset of nodes using a coordinate value than the following? (Pardon my > pseudo-code combo of python and c and imprecise arguments) > > // get the coordinates of the nodes > DAGetCoordinateDA(da, dac) > DAGetCoordinates(da, vecc) > DAVecGetArray(dac, vecc, vecc_a) > > // generate a one-dof da with no ghosts and the same parallel > // structure as the da > DAGetOwnershipRanges(da, lx[], ly[], lz[]) > DACreate3D(comm, M,N,P,len(lx),len(ly),len(lz),1,0, lx[], ly[], \ > lz[], da_one) > > // get the global indices of the one-dof da, noting that because > // we set the stencil size to zero, the local array of global > // indices is the same size as the local portion of the global array > // of coordinates > DAGetCorners(xs, ys, zs, xl, yl, zl) > DAGetGlobalIndices(da_one, xl*yl*zl, indices[]) > > // loop over the local array returned by > // DAGetGlobalIndices and the coordinates, > // comparing to the test > global_block_indices = [] > for i in range(xs, xs+xl): > for j in range(ys, ys+yl): > for k in range(zs, zs+zl): > if vecc_a[k,j,i,:] == whatever_coordinate: > global_block_indices.append(indices[k,j,i]) > > > // make the IS > ISCreateBlock(comm, ndofs, global_block_indices, coord_is) > > // restore and destroy etc. > > This just seems quite complicated with the construction of the one-dof > da to get global indices of the block. There might be a better way with > ISLocalToGlobalMapping, but I wasn't sure how. Any suggestions? > > Thanks, > > Ethan > > > > > -- > ------------------------------------- > Ethan Coon > Post-Doctoral Researcher > Mathematical Modeling and Analysis > Los Alamos National Laboratory > 505-665-8289 > > http://www.ldeo.columbia.edu/~ecoon/ > ------------------------------------- > From ecoon at lanl.gov Wed Dec 15 15:04:29 2010 From: ecoon at lanl.gov (Ethan Coon) Date: Wed, 15 Dec 2010 14:04:29 -0700 Subject: [petsc-users] IS from DA by coordinates In-Reply-To: <9367973E-331F-4608-8CCE-BF23BAA9F30B@mcs.anl.gov> References: <1292440199.15255.38.camel@hahn.lanl.gov> <9367973E-331F-4608-8CCE-BF23BAA9F30B@mcs.anl.gov> Message-ID: <1292447069.15255.97.camel@hahn.lanl.gov> On Wed, 2010-12-15 at 13:26 -0600, Barry Smith wrote: > Ethan, > > I don't think there is any reason to create a new DA or call DAGetGlobalIndices().. Just call DAGetLocalToGlobalMappingBlock() on the original DA Then > > for i in range(xs, xs+xl): > for j in range(ys, ys+yl): > for k in range(zs, zs+zl): > if vecc_a[k,j,i,:] == whatever_coordinate: > local_indices.append( convert i,j,k, to local numbering with something like (k-zs)*mx*my + (j-ys)*mx + .. > ... > > Now apply ISLocalToGlobalMappingApply Great, this is what I was missing. This should do the trick. Thanks Barry. Ethan > to local_indices and you have a list of global indices depending on what you want you do with this beast you may need to scale by bs or 1/bs > > Barry > > > On Dec 15, 2010, at 1:09 PM, Ethan Coon wrote: > > > Hi all, > > > > Is there a cleaner way to create an IS to a global vector on a DA for a > > subset of nodes using a coordinate value than the following? (Pardon my > > pseudo-code combo of python and c and imprecise arguments) > > > > // get the coordinates of the nodes > > DAGetCoordinateDA(da, dac) > > DAGetCoordinates(da, vecc) > > DAVecGetArray(dac, vecc, vecc_a) > > > > // generate a one-dof da with no ghosts and the same parallel > > // structure as the da > > DAGetOwnershipRanges(da, lx[], ly[], lz[]) > > DACreate3D(comm, M,N,P,len(lx),len(ly),len(lz),1,0, lx[], ly[], \ > > lz[], da_one) > > > > // get the global indices of the one-dof da, noting that because > > // we set the stencil size to zero, the local array of global > > // indices is the same size as the local portion of the global array > > // of coordinates > > DAGetCorners(xs, ys, zs, xl, yl, zl) > > DAGetGlobalIndices(da_one, xl*yl*zl, indices[]) > > > > // loop over the local array returned by > > // DAGetGlobalIndices and the coordinates, > > // comparing to the test > > global_block_indices = [] > > for i in range(xs, xs+xl): > > for j in range(ys, ys+yl): > > for k in range(zs, zs+zl): > > if vecc_a[k,j,i,:] == whatever_coordinate: > > global_block_indices.append(indices[k,j,i]) > > > > > > // make the IS > > ISCreateBlock(comm, ndofs, global_block_indices, coord_is) > > > > // restore and destroy etc. > > > > This just seems quite complicated with the construction of the one-dof > > da to get global indices of the block. There might be a better way with > > ISLocalToGlobalMapping, but I wasn't sure how. Any suggestions? > > > > Thanks, > > > > Ethan > > > > > > > > > > -- > > ------------------------------------- > > Ethan Coon > > Post-Doctoral Researcher > > Mathematical Modeling and Analysis > > Los Alamos National Laboratory > > 505-665-8289 > > > > http://www.ldeo.columbia.edu/~ecoon/ > > ------------------------------------- > > > -- ------------------------------------- Ethan Coon Post-Doctoral Researcher Mathematical Modeling and Analysis Los Alamos National Laboratory 505-665-8289 http://www.ldeo.columbia.edu/~ecoon/ ------------------------------------- From vijay.m at gmail.com Wed Dec 15 18:06:53 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 15 Dec 2010 18:06:53 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. Message-ID: Hi, I have an implementation issue with the MatRestrict/Interpolate functions. The problem is that one of my coarser levels (with PCMG) has higher dofs than the finest level. This does not always happen and requires a weird fine mesh system (in a sense) that uses multi-grid, but the idea is that the finest level problem has a high order (HO) discretization while the lower level mesh has a linear tesselation of the finest HO level (which I can optimize) and then adaptively coarsened levels beyond that. Since the number of columns in this case is larger than the number of rows, MatRestrict invariably calls MatMultTranspose to multiply instead of MatMult and vice-versa while calling MatInterpolate. These result in assertion errors while comparing the length of Mat and Vec. The chosen method is based on whether (M>N) which seems to act against what I am doing here... I can always implement a shell matrix to replicate Restrict/Interpolate actions but my question is whether if such discretization will yield a consistent convergence in MG algorithm ? Is there a strong reason for checking if (M>N) rather than just doing (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would appreciate any detailed answer that you can provide for this and any suggestions to use the existing methods (without implementing the shell restriction) is very welcome. Thanks, vijay From mmnasr at gmail.com Wed Dec 15 19:52:54 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Wed, 15 Dec 2010 17:52:54 -0800 Subject: [petsc-users] Updating the ghost nodes for distributed arrays In-Reply-To: References: Message-ID: Thanks Barry for your help. M On Tue, Dec 14, 2010 at 6:12 AM, Barry Smith wrote: > > On Dec 14, 2010, at 4:15 AM, Mohamad M. Nasr-Azadani wrote: > > > Hi guys, > > > > Is it possible to update the ghost values from a global to a local vector > for distributed arrays when global and local vectors are not from the same > DA, but the global vectors are the same? > > Yes > > > This is the the code that I have, (the only difference between the two > DA's is the width. So, I am assuming that any global vector created based on > those are going to be the same) > > Yes > > > > > > G_data is created based on DA_3D, whereas L_data2 is created based on > DA_3D2. > > > > > > Vec G_data, L_data; > > Vec G_data2, L_data2; > > > > > > ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, > PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width, PETSC_NULL, PETSC_NULL, > PETSC_NULL, &DA_3D); > > ierr = DACreate3d(PCW, DA_NONPERIODIC, DA_STENCIL_STAR, NX, NY, NZ, > PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, width+2, PETSC_NULL, > PETSC_NULL, PETSC_NULL, &DA_3D2); > > > > ierr = DACreateGlobalVector(DA_3D, &G_data); CHKERRQ(ierr); > > ierr = DACreateLocalVector(DA_3D, &L_data); CHKERRQ(ierr); > > > > ierr = DACreateGlobalVector(DA_3D2, &G_data2); CHKERRQ(ierr); > > ierr = DACreateLocalVector(DA_3D2, &L_data2); CHKERRQ(ierr); > > > > /* =====> Is this possible? */ > > ierr = DAGlobalToLocalBegin(DA_3D2, G_data, INSERT_VALUES, > L_data2);CHKERRQ(ierr); > > ierr = DAGlobalToLocalEnd(DA_3D2, G_data, INSERT_VALUES, > L_data2);CHKERRQ(ierr); > > > > > > Thanks, > > Mohamad > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Dec 15 19:53:05 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 15 Dec 2010 19:53:05 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: References: Message-ID: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> Vijay, The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. I'll try to get it down in the next few hours but it may take a little longer. Barry On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: > Hi, > > I have an implementation issue with the MatRestrict/Interpolate > functions. The problem is that one of my coarser levels (with PCMG) > has higher dofs than the finest level. This does not always happen and > requires a weird fine mesh system (in a sense) that uses multi-grid, > but the idea is that the finest level problem has a high order (HO) > discretization while the lower level mesh has a linear tesselation of > the finest HO level (which I can optimize) and then adaptively > coarsened levels beyond that. Since the number of columns in this case > is larger than the number of rows, MatRestrict invariably calls > MatMultTranspose to multiply instead of MatMult and vice-versa while > calling MatInterpolate. These result in assertion errors while > comparing the length of Mat and Vec. The chosen method is based on > whether (M>N) which seems to act against what I am doing here... > > I can always implement a shell matrix to replicate > Restrict/Interpolate actions but my question is whether if such > discretization will yield a consistent convergence in MG algorithm ? > Is there a strong reason for checking if (M>N) rather than just doing > (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would > appreciate any detailed answer that you can provide for this and any > suggestions to use the existing methods (without implementing the > shell restriction) is very welcome. > > Thanks, > vijay From mmnasr at gmail.com Wed Dec 15 19:53:20 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Wed, 15 Dec 2010 17:53:20 -0800 Subject: [petsc-users] KSP solver and Distributed arrays In-Reply-To: References: Message-ID: Thank you Barry and Matthew. Mohamad On Tue, Dec 14, 2010 at 7:14 AM, Matthew Knepley wrote: > Yes that is possible. The solver does not know about ghost nodes. > > Matt > > > On Mon, Dec 13, 2010 at 8:42 PM, Mohamad M. Nasr-Azadani > wrote: > >> Hi guys, >> >> A simple question. >> Can I solve a linear system [A]{x} = {b} using KSP solvers using Matrix >> and rhs, solution vectors with different number of ghost nodes (width)? >> As an example, can it be possible to solve for >> vector {x} DA, width=3, STAR stencil, >> [A] DA, width=1, STAR stencil >> {b} DA, width=1, STAR stencil >> >> I am suspecting that it is not possible. >> >> Thanks, >> Mohamad >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Dec 15 20:04:08 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 15 Dec 2010 20:04:08 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> References: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> Message-ID: <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> I have pushed this change to petsc-dev and it is ready for use. Barry Note it can still glitch if the restricted size is exactly the original size. :-( On Dec 15, 2010, at 7:53 PM, Barry Smith wrote: > > Vijay, > > The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. > > I'll try to get it down in the next few hours but it may take a little longer. > > > Barry > > On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: > >> Hi, >> >> I have an implementation issue with the MatRestrict/Interpolate >> functions. The problem is that one of my coarser levels (with PCMG) >> has higher dofs than the finest level. This does not always happen and >> requires a weird fine mesh system (in a sense) that uses multi-grid, >> but the idea is that the finest level problem has a high order (HO) >> discretization while the lower level mesh has a linear tesselation of >> the finest HO level (which I can optimize) and then adaptively >> coarsened levels beyond that. Since the number of columns in this case >> is larger than the number of rows, MatRestrict invariably calls >> MatMultTranspose to multiply instead of MatMult and vice-versa while >> calling MatInterpolate. These result in assertion errors while >> comparing the length of Mat and Vec. The chosen method is based on >> whether (M>N) which seems to act against what I am doing here... >> >> I can always implement a shell matrix to replicate >> Restrict/Interpolate actions but my question is whether if such >> discretization will yield a consistent convergence in MG algorithm ? >> Is there a strong reason for checking if (M>N) rather than just doing >> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would >> appreciate any detailed answer that you can provide for this and any >> suggestions to use the existing methods (without implementing the >> shell restriction) is very welcome. >> >> Thanks, >> vijay > From vijay.m at gmail.com Wed Dec 15 20:16:37 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 15 Dec 2010 20:16:37 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> References: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> Message-ID: Barry, Thanks for the prompt change ! I do not work on the development version but I can update these matrix routines alone. > ?Note it can still glitch if the restricted size is exactly the original size. :-( Why would it glitch if the restricted size is the same as the original size though ? I dont see a case where your check (M==Ny) would fail. Can you please elaborate more on this ? Vijay On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith wrote: > > ?I have pushed this change to petsc-dev and it is ready for use. > > ? Barry > > ?Note it can still glitch if the restricted size is exactly the original size. :-( > > > On Dec 15, 2010, at 7:53 PM, Barry Smith wrote: > >> >> ?Vijay, >> >> ? ?The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. >> >> ? I'll try to get it down in the next few hours but it may take a little longer. >> >> >> ? Barry >> >> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: >> >>> Hi, >>> >>> I have an implementation issue with the MatRestrict/Interpolate >>> functions. The problem is that one of my coarser levels (with PCMG) >>> has higher dofs than the finest level. This does not always happen and >>> requires a weird fine mesh system (in a sense) that uses multi-grid, >>> but the idea is that the finest level problem has a high order (HO) >>> discretization while the lower level mesh has a linear tesselation of >>> the finest HO level (which I can optimize) and then adaptively >>> coarsened levels beyond that. Since the number of columns in this case >>> is larger than the number of rows, MatRestrict invariably calls >>> MatMultTranspose to multiply instead of MatMult and vice-versa while >>> calling ?MatInterpolate. These result in assertion errors while >>> comparing the length of Mat and Vec. The chosen method is based on >>> whether (M>N) which seems to act against what I am doing here... >>> >>> I can always implement a shell matrix to replicate >>> Restrict/Interpolate actions but my question is whether if such >>> discretization will yield a consistent convergence in MG algorithm ? >>> Is there a strong reason for checking if (M>N) rather than just doing >>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would >>> appreciate any detailed answer that you can provide for this and any >>> suggestions to use the existing methods (without implementing the >>> shell restriction) is very welcome. >>> >>> Thanks, >>> vijay >> > > From bsmith at mcs.anl.gov Wed Dec 15 20:36:53 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 15 Dec 2010 20:36:53 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: References: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> Message-ID: <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov> On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote: > Barry, > > Thanks for the prompt change ! I do not work on the development > version but I can update these matrix routines alone. > >> Note it can still glitch if the restricted size is exactly the original size. :-( > > Why would it glitch if the restricted size is the same as the original > size though ? I dont see a case where your check (M==Ny) would fail. > Can you please elaborate more on this ? Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. Barry > > Vijay > > On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith wrote: >> >> I have pushed this change to petsc-dev and it is ready for use. >> >> Barry >> >> Note it can still glitch if the restricted size is exactly the original size. :-( >> >> >> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote: >> >>> >>> Vijay, >>> >>> The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. >>> >>> I'll try to get it down in the next few hours but it may take a little longer. >>> >>> >>> Barry >>> >>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: >>> >>>> Hi, >>>> >>>> I have an implementation issue with the MatRestrict/Interpolate >>>> functions. The problem is that one of my coarser levels (with PCMG) >>>> has higher dofs than the finest level. This does not always happen and >>>> requires a weird fine mesh system (in a sense) that uses multi-grid, >>>> but the idea is that the finest level problem has a high order (HO) >>>> discretization while the lower level mesh has a linear tesselation of >>>> the finest HO level (which I can optimize) and then adaptively >>>> coarsened levels beyond that. Since the number of columns in this case >>>> is larger than the number of rows, MatRestrict invariably calls >>>> MatMultTranspose to multiply instead of MatMult and vice-versa while >>>> calling MatInterpolate. These result in assertion errors while >>>> comparing the length of Mat and Vec. The chosen method is based on >>>> whether (M>N) which seems to act against what I am doing here... >>>> >>>> I can always implement a shell matrix to replicate >>>> Restrict/Interpolate actions but my question is whether if such >>>> discretization will yield a consistent convergence in MG algorithm ? >>>> Is there a strong reason for checking if (M>N) rather than just doing >>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would >>>> appreciate any detailed answer that you can provide for this and any >>>> suggestions to use the existing methods (without implementing the >>>> shell restriction) is very welcome. >>>> >>>> Thanks, >>>> vijay >>> >> >> From vijay.m at gmail.com Wed Dec 15 21:12:21 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 15 Dec 2010 21:12:21 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov> References: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov> Message-ID: > ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. hmm, in the case when I provide a matrix for both Interpolate/Restrict, I have (M==N==Nx==Ny), this would always call the MatMult routine. As long as I provide both these operators/matrices explicitly, there is still no problem. A possible issue is only when someone provides just the restriction or prolongation. But I understand that when this happens, the other operator is computed explicitly as its transpose. If this is actual implementation, I still dont see a problem. Although, if the restriction/prolongation operator are implicitly assumed to be transpose of the other, then it will quite horribly fail since only MatMult is called for both. I am not completely sure about the mode currently used in petsc but it would be great if you can help me understand. Note: I probe more on this since my linear tesselation (most often) results in the same number of dofs on the coarser level (p-coarsened/h-refined) and I dont want a glitch to come back and bite me later on.. Vijay On Wed, Dec 15, 2010 at 8:36 PM, Barry Smith wrote: > > On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote: > >> Barry, >> >> Thanks for the prompt change ! I do not work on the development >> version but I can update these matrix routines alone. >> >>> ?Note it can still glitch if the restricted size is exactly the original size. :-( >> >> Why would it glitch if the restricted size is the same as the original >> size though ? I dont see a case where your check (M==Ny) would fail. >> Can you please elaborate more on this ? > > ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. > > ?Barry > >> >> Vijay >> >> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith wrote: >>> >>> ?I have pushed this change to petsc-dev and it is ready for use. >>> >>> ? Barry >>> >>> ?Note it can still glitch if the restricted size is exactly the original size. :-( >>> >>> >>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote: >>> >>>> >>>> ?Vijay, >>>> >>>> ? ?The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. >>>> >>>> ? I'll try to get it down in the next few hours but it may take a little longer. >>>> >>>> >>>> ? Barry >>>> >>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: >>>> >>>>> Hi, >>>>> >>>>> I have an implementation issue with the MatRestrict/Interpolate >>>>> functions. The problem is that one of my coarser levels (with PCMG) >>>>> has higher dofs than the finest level. This does not always happen and >>>>> requires a weird fine mesh system (in a sense) that uses multi-grid, >>>>> but the idea is that the finest level problem has a high order (HO) >>>>> discretization while the lower level mesh has a linear tesselation of >>>>> the finest HO level (which I can optimize) and then adaptively >>>>> coarsened levels beyond that. Since the number of columns in this case >>>>> is larger than the number of rows, MatRestrict invariably calls >>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while >>>>> calling ?MatInterpolate. These result in assertion errors while >>>>> comparing the length of Mat and Vec. The chosen method is based on >>>>> whether (M>N) which seems to act against what I am doing here... >>>>> >>>>> I can always implement a shell matrix to replicate >>>>> Restrict/Interpolate actions but my question is whether if such >>>>> discretization will yield a consistent convergence in MG algorithm ? >>>>> Is there a strong reason for checking if (M>N) rather than just doing >>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would >>>>> appreciate any detailed answer that you can provide for this and any >>>>> suggestions to use the existing methods (without implementing the >>>>> shell restriction) is very welcome. >>>>> >>>>> Thanks, >>>>> vijay >>>> >>> >>> > > From bsmith at mcs.anl.gov Wed Dec 15 21:16:28 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 15 Dec 2010 21:16:28 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: References: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov> Message-ID: <95AB2BDE-871C-4F3A-B096-5E26030D7080@mcs.anl.gov> If you explicitly provide both then you are ok. If you provide one and not the other it will fail silently with a bad algorithm. Barry On Dec 15, 2010, at 9:12 PM, Vijay S. Mahadevan wrote: >> Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. > > hmm, in the case when I provide a matrix for both > Interpolate/Restrict, I have (M==N==Nx==Ny), this would always call > the MatMult routine. As long as I provide both these > operators/matrices explicitly, there is still no problem. A possible > issue is only when someone provides just the restriction or > prolongation. But I understand that when this happens, the other > operator is computed explicitly as its transpose. If this is actual > implementation, I still dont see a problem. Although, if the > restriction/prolongation operator are implicitly assumed to be > transpose of the other, then it will quite horribly fail since only > MatMult is called for both. I am not completely sure about the mode > currently used in petsc but it would be great if you can help me > understand. > > Note: I probe more on this since my linear tesselation (most often) > results in the same number of dofs on the coarser level > (p-coarsened/h-refined) and I dont want a glitch to come back and bite > me later on.. > > Vijay > > On Wed, Dec 15, 2010 at 8:36 PM, Barry Smith wrote: >> >> On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote: >> >>> Barry, >>> >>> Thanks for the prompt change ! I do not work on the development >>> version but I can update these matrix routines alone. >>> >>>> Note it can still glitch if the restricted size is exactly the original size. :-( >>> >>> Why would it glitch if the restricted size is the same as the original >>> size though ? I dont see a case where your check (M==Ny) would fail. >>> Can you please elaborate more on this ? >> >> Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. >> >> Barry >> >>> >>> Vijay >>> >>> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith wrote: >>>> >>>> I have pushed this change to petsc-dev and it is ready for use. >>>> >>>> Barry >>>> >>>> Note it can still glitch if the restricted size is exactly the original size. :-( >>>> >>>> >>>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote: >>>> >>>>> >>>>> Vijay, >>>>> >>>>> The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. >>>>> >>>>> I'll try to get it down in the next few hours but it may take a little longer. >>>>> >>>>> >>>>> Barry >>>>> >>>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have an implementation issue with the MatRestrict/Interpolate >>>>>> functions. The problem is that one of my coarser levels (with PCMG) >>>>>> has higher dofs than the finest level. This does not always happen and >>>>>> requires a weird fine mesh system (in a sense) that uses multi-grid, >>>>>> but the idea is that the finest level problem has a high order (HO) >>>>>> discretization while the lower level mesh has a linear tesselation of >>>>>> the finest HO level (which I can optimize) and then adaptively >>>>>> coarsened levels beyond that. Since the number of columns in this case >>>>>> is larger than the number of rows, MatRestrict invariably calls >>>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while >>>>>> calling MatInterpolate. These result in assertion errors while >>>>>> comparing the length of Mat and Vec. The chosen method is based on >>>>>> whether (M>N) which seems to act against what I am doing here... >>>>>> >>>>>> I can always implement a shell matrix to replicate >>>>>> Restrict/Interpolate actions but my question is whether if such >>>>>> discretization will yield a consistent convergence in MG algorithm ? >>>>>> Is there a strong reason for checking if (M>N) rather than just doing >>>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would >>>>>> appreciate any detailed answer that you can provide for this and any >>>>>> suggestions to use the existing methods (without implementing the >>>>>> shell restriction) is very welcome. >>>>>> >>>>>> Thanks, >>>>>> vijay >>>>> >>>> >>>> >> >> From vijay.m at gmail.com Thu Dec 16 09:00:48 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 16 Dec 2010 09:00:48 -0600 Subject: [petsc-users] Use of MatRestrict/MatInterpolate with PCMG. In-Reply-To: <95AB2BDE-871C-4F3A-B096-5E26030D7080@mcs.anl.gov> References: <692C91A3-E955-406E-B0DD-D014F7868065@mcs.anl.gov> <0FAF910B-9684-44D7-BD33-9F3DE15F6391@mcs.anl.gov> <49390E07-E5CE-4ABD-9BEA-05954726E2C7@mcs.anl.gov> <95AB2BDE-871C-4F3A-B096-5E26030D7080@mcs.anl.gov> Message-ID: Yes, I understand. But isn't there a flag to check if the transpose was implicitly assumed ? It might be bad to compute the transpose of the operator not provided, explicitly but the interface that calls Restrict/Interpolate on Mat such as PCMG or DMMG should probably know this and make the call appropriately. Or you could set the Mat operations for the transposed operator accordingly i.e., MatMult_operator = MatMultTranspose; MatMultTranspose_operator = MatMult. These are just couple of workarounds to tackle the glitch. Barry, if you make further changes on this, I would much appreciate it if you can let me know. Thanks. Vijay On Wed, Dec 15, 2010 at 9:16 PM, Barry Smith wrote: > > ?If you explicitly provide both then you are ok. If you provide one and not the other it will fail silently with a bad algorithm. > > > ?Barry > > On Dec 15, 2010, at 9:12 PM, Vijay S. Mahadevan wrote: > >>> ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. >> >> hmm, in the case when I provide a matrix for both >> Interpolate/Restrict, I have (M==N==Nx==Ny), ?this would always call >> the MatMult routine. As long as I provide both these >> operators/matrices explicitly, there is still no problem. A possible >> issue is only when someone provides just the restriction or >> prolongation. But I understand that when this happens, the other >> operator is computed explicitly as its transpose. If this is actual >> implementation, I still dont see a problem. Although, if the >> restriction/prolongation operator are implicitly assumed to be >> transpose of the other, then it will quite horribly fail since only >> MatMult is called for both. I am not completely sure about the mode >> currently used in petsc but it would be great if you can help me >> understand. >> >> Note: I probe more on this since my linear tesselation (most often) >> results in the same number of dofs on the coarser level >> (p-coarsened/h-refined) and I dont want a glitch to come back and bite >> me later on.. >> >> Vijay >> >> On Wed, Dec 15, 2010 at 8:36 PM, Barry Smith wrote: >>> >>> On Dec 15, 2010, at 8:16 PM, Vijay S. Mahadevan wrote: >>> >>>> Barry, >>>> >>>> Thanks for the prompt change ! I do not work on the development >>>> version but I can update these matrix routines alone. >>>> >>>>> ?Note it can still glitch if the restricted size is exactly the original size. :-( >>>> >>>> Why would it glitch if the restricted size is the same as the original >>>> size though ? I dont see a case where your check (M==Ny) would fail. >>>> Can you please elaborate more on this ? >>> >>> ?Well if they happen to be equal then it will never apply the transpose thus giving a bad algorithm and garbage. >>> >>> ?Barry >>> >>>> >>>> Vijay >>>> >>>> On Wed, Dec 15, 2010 at 8:04 PM, Barry Smith wrote: >>>>> >>>>> ?I have pushed this change to petsc-dev and it is ready for use. >>>>> >>>>> ? Barry >>>>> >>>>> ?Note it can still glitch if the restricted size is exactly the original size. :-( >>>>> >>>>> >>>>> On Dec 15, 2010, at 7:53 PM, Barry Smith wrote: >>>>> >>>>>> >>>>>> ?Vijay, >>>>>> >>>>>> ? ?The use of M>N in MatRestrict and MatInterpolate was always a bit cheesy since it has this broken case that you reported. I will change it to do as you suggest and use the size of the vectors in determining which way to apply. But note I will do this in petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html not petsc-3.1 so you'll need to switch if you are not using petsc-dev. >>>>>> >>>>>> ? I'll try to get it down in the next few hours but it may take a little longer. >>>>>> >>>>>> >>>>>> ? Barry >>>>>> >>>>>> On Dec 15, 2010, at 6:06 PM, Vijay S. Mahadevan wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have an implementation issue with the MatRestrict/Interpolate >>>>>>> functions. The problem is that one of my coarser levels (with PCMG) >>>>>>> has higher dofs than the finest level. This does not always happen and >>>>>>> requires a weird fine mesh system (in a sense) that uses multi-grid, >>>>>>> but the idea is that the finest level problem has a high order (HO) >>>>>>> discretization while the lower level mesh has a linear tesselation of >>>>>>> the finest HO level (which I can optimize) and then adaptively >>>>>>> coarsened levels beyond that. Since the number of columns in this case >>>>>>> is larger than the number of rows, MatRestrict invariably calls >>>>>>> MatMultTranspose to multiply instead of MatMult and vice-versa while >>>>>>> calling ?MatInterpolate. These result in assertion errors while >>>>>>> comparing the length of Mat and Vec. The chosen method is based on >>>>>>> whether (M>N) which seems to act against what I am doing here... >>>>>>> >>>>>>> I can always implement a shell matrix to replicate >>>>>>> Restrict/Interpolate actions but my question is whether if such >>>>>>> discretization will yield a consistent convergence in MG algorithm ? >>>>>>> Is there a strong reason for checking if (M>N) rather than just doing >>>>>>> (mat->rmap->N==y->map->N && mat->cmap->N==x->map->N) ? I would >>>>>>> appreciate any detailed answer that you can provide for this and any >>>>>>> suggestions to use the existing methods (without implementing the >>>>>>> shell restriction) is very welcome. >>>>>>> >>>>>>> Thanks, >>>>>>> vijay >>>>>> >>>>> >>>>> >>> >>> > > From enjoywm at cs.wm.edu Thu Dec 16 14:52:53 2010 From: enjoywm at cs.wm.edu (enjoywm at cs.wm.edu) Date: Thu, 16 Dec 2010 15:52:53 -0500 Subject: [petsc-users] installation correct? Message-ID: <3164c3ba3dd9c86908be6debdc9f9788.squirrel@mail.cs.wm.edu> Hi, After make test, I got the following output. I want to make sure if the installation is correct. Thanks. Yixun command: /petsc-3.1-p6> make PETSC_DIR=/yliu/MyVC/petsc-3.1-p6 PETSC_ARCH=linux-gnu-c-debug test ******************************************output**************************************************************** Running test examples to verify correct installation C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI process C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI processes --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html /yliu/MPICH2/bin/mpif90 -c -Wall -Wno-unused-variable -g -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MyVC/petsc-3.1-p6/include -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include -o ex5f.o ex5f.F ex5f.F:92.72: call PetscOptionsGetReal(PETSC_NULL_CHARACTER,'-par',lambda, 1 Warning: Line truncated at (1) ex5f.F:113.72: call DACreate2d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_STAR, 1 Warning: Line truncated at (1) ex5f.F:114.72: & i4,i4,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER, 1 Warning: Line truncated at (1) ex5f.F:125.72: call DAGetInfo(da,PETSC_NULL_INTEGER,mx,my,PETSC_NULL_INTEGER, 1 Warning: Line truncated at (1) ex5f.F:126.72: & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, 1 Warning: Line truncated at (1) ex5f.F:127.72: & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, 1 Warning: Line truncated at (1) ex5f.F:128.72: & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, 1 Warning: Line truncated at (1) ex5f.F:130.72: call DAGetCorners(da,xs,ys,PETSC_NULL_INTEGER,xm,ym, 1 Warning: Line truncated at (1) ex5f.F:132.72: call DAGetGhostCorners(da,gxs,gys,PETSC_NULL_INTEGER,gxm,gym, 1 Warning: Line truncated at (1) ex5f.F:188.72: call SNESSetJacobian(snes,A,J,SNESDAComputeJacobian, 1 Warning: Line truncated at (1) ex5f.F:344.72: if (i .eq. 1 .or. j .eq. 1 1 Warning: Line truncated at (1) ex5f.F:348.72: x(i,j) = temp1 * 1 Warning: Line truncated at (1) ex5f.F:412.72: if (i .eq. 1 .or. j .eq. 1 1 Warning: Line truncated at (1) ex5f.F:417.72: uxx = hydhx * (two*u 1 Warning: Line truncated at (1) ex5f.F:517.72: if (i .eq. 1 .or. j .eq. 1 1 Warning: Line truncated at (1) ex5f.F:522.72: call MatSetValuesLocal(jac,i1,row,i1,col,v, 1 Warning: Line truncated at (1) ex5f.F:528.72: v(3) = two*(hydhx + hxdhy) 1 Warning: Line truncated at (1) ex5f.F:537.72: call MatSetValuesLocal(jac,i1,row,i5,col,v, 1 Warning: Line truncated at (1) /yliu/MPICH2/bin/mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lpetsc -lX11 -Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lparmetis -lmetis -lflapack -lfblas -lm -L/yliu/MPICH2/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.5 -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lgfortran -lm -lm -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl /bin/rm -f ex5f.o Fortran example src/snes/examples/tutorials/ex5f run successfully with 1 MPI process Completed test examples ************************************************************************************************************************** From balay at mcs.anl.gov Thu Dec 16 14:55:41 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 16 Dec 2010 14:55:41 -0600 (CST) Subject: [petsc-users] installation correct? In-Reply-To: <3164c3ba3dd9c86908be6debdc9f9788.squirrel@mail.cs.wm.edu> References: <3164c3ba3dd9c86908be6debdc9f9788.squirrel@mail.cs.wm.edu> Message-ID: yes - you have a valid install. You can ignore the gfortran warnings [if you wish to eliminate them - you can use FFLAGS=-Wno-line-truncation with configure] Satish On Thu, 16 Dec 2010, enjoywm at cs.wm.edu wrote: > Hi, > After make test, I got the following output. > I want to make sure if the installation is correct. > > Thanks. > > Yixun > > command: > /petsc-3.1-p6> make PETSC_DIR=/yliu/MyVC/petsc-3.1-p6 > PETSC_ARCH=linux-gnu-c-debug test > > > ******************************************output**************************************************************** > Running test examples to verify correct installation > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI > process > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI > processes > --------------Error detected during compile or link!----------------------- > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html > /yliu/MPICH2/bin/mpif90 -c -Wall -Wno-unused-variable -g > -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include > -I/yliu/MyVC/petsc-3.1-p6/include > -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include > -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include > -I/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/include -I/yliu/MPICH2/include > -o ex5f.o ex5f.F > ex5f.F:92.72: > > call PetscOptionsGetReal(PETSC_NULL_CHARACTER,'-par',lambda, > 1 > Warning: Line truncated at (1) > ex5f.F:113.72: > > call DACreate2d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_STAR, > 1 > Warning: Line truncated at (1) > ex5f.F:114.72: > > & i4,i4,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER, > 1 > Warning: Line truncated at (1) > ex5f.F:125.72: > > call DAGetInfo(da,PETSC_NULL_INTEGER,mx,my,PETSC_NULL_INTEGER, > 1 > Warning: Line truncated at (1) > ex5f.F:126.72: > > & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, > 1 > Warning: Line truncated at (1) > ex5f.F:127.72: > > & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, > 1 > Warning: Line truncated at (1) > ex5f.F:128.72: > > & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, > 1 > Warning: Line truncated at (1) > ex5f.F:130.72: > > call DAGetCorners(da,xs,ys,PETSC_NULL_INTEGER,xm,ym, > 1 > Warning: Line truncated at (1) > ex5f.F:132.72: > > call DAGetGhostCorners(da,gxs,gys,PETSC_NULL_INTEGER,gxm,gym, > 1 > Warning: Line truncated at (1) > ex5f.F:188.72: > > call SNESSetJacobian(snes,A,J,SNESDAComputeJacobian, > 1 > Warning: Line truncated at (1) > ex5f.F:344.72: > > if (i .eq. 1 .or. j .eq. 1 > 1 > Warning: Line truncated at (1) > ex5f.F:348.72: > > x(i,j) = temp1 * > 1 > Warning: Line truncated at (1) > ex5f.F:412.72: > > if (i .eq. 1 .or. j .eq. 1 > 1 > Warning: Line truncated at (1) > ex5f.F:417.72: > > uxx = hydhx * (two*u > 1 > Warning: Line truncated at (1) > ex5f.F:517.72: > > if (i .eq. 1 .or. j .eq. 1 > 1 > Warning: Line truncated at (1) > ex5f.F:522.72: > > call MatSetValuesLocal(jac,i1,row,i1,col,v, > 1 > Warning: Line truncated at (1) > ex5f.F:528.72: > > v(3) = two*(hydhx + hxdhy) > 1 > Warning: Line truncated at (1) > ex5f.F:537.72: > > call MatSetValuesLocal(jac,i1,row,i5,col,v, > 1 > Warning: Line truncated at (1) > /yliu/MPICH2/bin/mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o > -Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib > -L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lpetsc -lX11 > -Wl,-rpath,/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib > -L/yliu/MyVC/petsc-3.1-p6/linux-gnu-c-debug/lib -lparmetis -lmetis > -lflapack -lfblas -lm -L/yliu/MPICH2/lib > -L/usr/lib64/gcc/x86_64-suse-linux/4.5 -L/usr/x86_64-suse-linux/lib -ldl > -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lgfortran -lm -lm > -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl > /bin/rm -f ex5f.o > Fortran example src/snes/examples/tutorials/ex5f run successfully with 1 > MPI process > Completed test examples > > ************************************************************************************************************************** > > > > > From mmnasr at gmail.com Fri Dec 17 09:23:42 2010 From: mmnasr at gmail.com (Mohamad M. Nasr-Azadani) Date: Fri, 17 Dec 2010 07:23:42 -0800 Subject: [petsc-users] Matrix setup and Distributed arrays, Message-ID: Hi guys, I am trying to solve a simpel Poisson equation on regular grid (test case for my code). This is my problem. Sorry for long discussion. I want to make it clear what I am doing. 1- Create distributed array (same number of processors and parallel layout): DA_3D_STAR: DA_STAR_STENCIL, width=1 2- Create distributed array (same number of processors and parallel layout): DA_3D_BOX : DA_BOX_STENCIL, width=3 3- Create Matrix (A) using: ierr = DAGetMatrix(DA_3D_STAR, MATMPIAIJ, &A); 4- Setup Matrix A in the corresponding fashion 4-1 For all the regular local nodes (based on DA_3D_STAR), simply use MatSetValuesStencil() to insert maximum 7-nonzerons in each row. 4-2 For some special nodes (still local though) use a 9-point interpolation equation. Theses nonzeros might extend outside of 1-layer of ghost nodes, also it might be in the direction not corresponding to the 7-point stencil. Hence, it would be possible that some rows in the matrix owns nonzeros not within the local+ghost node regions. MatSetValuesStencil() does not work. Here, the DA_3D_BOX comes useful. 4-3 Return the global index of any node within the range of DA_3D_STAR, using the command ierr = DAGetGlobalIndices(DA_3D_BOX, &nt, &global_indices); 4-4 Now, I can freely use MatSetValues() to insert the 9-point interpolation nonzeros into the matrix (using the returned global indices). I thought this should work since MatSetValues() does not care if it is inserting in the local part or the global section of the matrix on the neighboring processor. However, for a very simple test case (solution of Poisson equation), this does not work in parallel. I narrowed it down and came to a very simple but annoying conclusion. I don't think the test code that I have suffers from any bugs. But this is what I got, When a new nonzero is inserted in (even) one row of the matrix where the new nonzero corresponds to a node from a neighboring processor and outside the width=1, STAR_STENCIL layout, using MatSetValues(), I get wrong results. Even if I insert a "zero" value, but at the given place, unfortunately I get wrong result. Note that, for a matrix of 10*10*10, I even tested that for one single row of a matrix, and unbelievably, I get wrong results. I printed the matrix for two cases where there is a new (nonzero) in the row, and there is not! Comparing the two matrices and using >>diff Bad Good This is the only existing difference: 150c150 < row 149: (100, 0) (148, 0) (149, 1) (150, 0) (156, -1) (198, 0) (212, 0) --- > row 149: (100, 0) (148, 0) (149, 1) (150, 0) (156, -1) (198, 0) They are exactly the same, except that there is new zero at (212) which should not alter the results! But I don't get the right result for the first case! This is driving me crazy! I don't have any clue why this happens. I can send you the test code if you think it helps. But to me, it sounds like when the matrix is created using DAGetMatrix(), there might be some sort of restriction to adding new nonzeros to the locations not defined based on the stencil and not within the range of the DA. In advance, thank you so much. Mohamad -------------- next part -------------- An HTML attachment was scrubbed... URL: From keita at cray.com Fri Dec 17 11:53:08 2010 From: keita at cray.com (Keita Teranishi) Date: Fri, 17 Dec 2010 11:53:08 -0600 Subject: [petsc-users] Why cannot ParMetis and Scotch be intergrated togther? Message-ID: <5D6E0DF460ACF34C88644E1EA91DCD0D01B23D54CF@CFEXMBX.americas.cray.com> Hi, I found PETSc's configure script fails to integrate ParMetis and Scotch together. I think Scotch has a option to rename it's ParMetis API to eliminate entry point duplications. Or is there any reason PETSc cannot put them together? Thanks, ================================ Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com ================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 17 12:34:14 2010 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Dec 2010 10:34:14 -0800 Subject: [petsc-users] Matrix setup and Distributed arrays, In-Reply-To: References: Message-ID: On Fri, Dec 17, 2010 at 7:23 AM, Mohamad M. Nasr-Azadani wrote: > Hi guys, > > I am trying to solve a simpel Poisson equation on regular grid (test case > for my code). > This is my problem. Sorry for long discussion. I want to make it clear what > I am doing. > > > 1- Create distributed array (same number of processors and parallel > layout): DA_3D_STAR: DA_STAR_STENCIL, width=1 > 2- Create distributed array (same number of processors and parallel > layout): DA_3D_BOX : DA_BOX_STENCIL, width=3 > 3- Create Matrix (A) using: ierr = DAGetMatrix(DA_3D_STAR, MATMPIAIJ, > &A); > 4- Setup Matrix A in the corresponding fashion > > 4-1 For all the regular local nodes (based on DA_3D_STAR), simply > use MatSetValuesStencil() to insert maximum 7-nonzerons in each row. > 4-2 For some special nodes (still local though) use a 9-point > interpolation equation. Theses nonzeros might extend outside of 1-layer of > ghost nodes, also it might be in the direction not corresponding to the > 7-point stencil. Hence, it would be possible that some rows in the matrix > owns nonzeros not within the local+ghost node regions. MatSetValuesStencil() > does not work. > Here, the DA_3D_BOX comes useful. > 4-3 Return the global index of any node within the range of > DA_3D_STAR, using the command > ierr = DAGetGlobalIndices(DA_3D_BOX, &nt, > &global_indices); > 4-4 Now, I can freely use MatSetValues() to insert the 9-point > interpolation nonzeros into the matrix (using the returned global indices). > > I thought this should work since MatSetValues() does not care if it is > inserting in the local part or the global section of the matrix on the > neighboring processor. > However, for a very simple test case (solution of Poisson equation), this > does not work in parallel. > I narrowed it down and came to a very simple but annoying conclusion. I > don't think the test code that I have suffers from any bugs. > But this is what I got, > When a new nonzero is inserted in (even) one row of the matrix where the > new nonzero corresponds to a node from a neighboring processor and outside > the width=1, STAR_STENCIL layout, using MatSetValues(), I get wrong results. > > Even if I insert a "zero" value, but at the given place, unfortunately I > get wrong result. > > Note that, for a matrix of 10*10*10, I even tested that for one single row > of a matrix, and unbelievably, I get wrong results. > I printed the matrix for two cases where there is a new (nonzero) in the > row, and there is not! > Comparing the two matrices and using > > >>diff Bad Good > > This is the only existing difference: > > 150c150 > < row 149: (100, 0) (148, 0) (149, 1) (150, 0) (156, -1) (198, 0) > (212, 0) > --- > > row 149: (100, 0) (148, 0) (149, 1) (150, 0) (156, -1) (198, 0) > > > They are exactly the same, except that there is new zero at (212) which > should not alter the results! > But I don't get the right result for the first case! > I do not know what you mean by the "right result". If the only difference is a 0 in the matrix, you will get the same result for MatMult(). Did you check this? Thanks, Matt > This is driving me crazy! I don't have any clue why this happens. > > I can send you the test code if you think it helps. > But to me, it sounds like when the matrix is created using DAGetMatrix(), > there might be some sort of restriction to adding new nonzeros to the > locations not defined based on the stencil and not within the range of the > DA. > > In advance, thank you so much. > Mohamad > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Dec 17 12:45:22 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 17 Dec 2010 12:45:22 -0600 Subject: [petsc-users] Matrix setup and Distributed arrays, In-Reply-To: References: Message-ID: <72A66DA3-BB86-4276-9C2C-C680A0A23D6A@mcs.anl.gov> Send the code to petsc-maint at mcs.anl.gov likely you are not calling MatAssemblyBegin/End or something similar in the correct place. Barry On Dec 17, 2010, at 9:23 AM, Mohamad M. Nasr-Azadani wrote: > Hi guys, > > I am trying to solve a simpel Poisson equation on regular grid (test case for my code). > This is my problem. Sorry for long discussion. I want to make it clear what I am doing. > > > 1- Create distributed array (same number of processors and parallel layout): DA_3D_STAR: DA_STAR_STENCIL, width=1 > 2- Create distributed array (same number of processors and parallel layout): DA_3D_BOX : DA_BOX_STENCIL, width=3 > 3- Create Matrix (A) using: ierr = DAGetMatrix(DA_3D_STAR, MATMPIAIJ, &A); > 4- Setup Matrix A in the corresponding fashion > > 4-1 For all the regular local nodes (based on DA_3D_STAR), simply use MatSetValuesStencil() to insert maximum 7-nonzerons in each row. > 4-2 For some special nodes (still local though) use a 9-point interpolation equation. Theses nonzeros might extend outside of 1-layer of ghost nodes, also it might be in the direction not corresponding to the 7-point stencil. Hence, it would be possible that some rows in the matrix owns nonzeros not within the local+ghost node regions. MatSetValuesStencil() does not work. > Here, the DA_3D_BOX comes useful. > 4-3 Return the global index of any node within the range of DA_3D_STAR, using the command > ierr = DAGetGlobalIndices(DA_3D_BOX, &nt, &global_indices); > 4-4 Now, I can freely use MatSetValues() to insert the 9-point interpolation nonzeros into the matrix (using the returned global indices). > > I thought this should work since MatSetValues() does not care if it is inserting in the local part or the global section of the matrix on the neighboring processor. > However, for a very simple test case (solution of Poisson equation), this does not work in parallel. > I narrowed it down and came to a very simple but annoying conclusion. I don't think the test code that I have suffers from any bugs. > But this is what I got, > When a new nonzero is inserted in (even) one row of the matrix where the new nonzero corresponds to a node from a neighboring processor and outside the width=1, STAR_STENCIL layout, using MatSetValues(), I get wrong results. > Even if I insert a "zero" value, but at the given place, unfortunately I get wrong result. > > Note that, for a matrix of 10*10*10, I even tested that for one single row of a matrix, and unbelievably, I get wrong results. > I printed the matrix for two cases where there is a new (nonzero) in the row, and there is not! > Comparing the two matrices and using > > >>diff Bad Good > > This is the only existing difference: > > 150c150 > < row 149: (100, 0) (148, 0) (149, 1) (150, 0) (156, -1) (198, 0) (212, 0) > --- > > row 149: (100, 0) (148, 0) (149, 1) (150, 0) (156, -1) (198, 0) > > > They are exactly the same, except that there is new zero at (212) which should not alter the results! > But I don't get the right result for the first case! > This is driving me crazy! I don't have any clue why this happens. > > I can send you the test code if you think it helps. > But to me, it sounds like when the matrix is created using DAGetMatrix(), there might be some sort of restriction to adding new nonzeros to the locations not defined based on the stencil and not within the range of the DA. > > In advance, thank you so much. > Mohamad > > > From bsmith at mcs.anl.gov Fri Dec 17 12:47:21 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 17 Dec 2010 12:47:21 -0600 Subject: [petsc-users] Why cannot ParMetis and Scotch be intergrated togther? In-Reply-To: <5D6E0DF460ACF34C88644E1EA91DCD0D01B23D54CF@CFEXMBX.americas.cray.com> References: <5D6E0DF460ACF34C88644E1EA91DCD0D01B23D54CF@CFEXMBX.americas.cray.com> Message-ID: They use to not be able to coexist in the same build. If Scotch has been properly fixed and you can figure out how to change Scotch's configuration to coexist we would be happy to accept the patch. We don't have the resources to fix a problem induced by someone else's very poor library design. Barry On Dec 17, 2010, at 11:53 AM, Keita Teranishi wrote: > Hi, > > I found PETSc?s configure script fails to integrate ParMetis and Scotch together. I think Scotch has a option to rename it?s ParMetis API to eliminate entry point duplications. Or is there any reason PETSc cannot put them together? > > Thanks, > ================================ > Keita Teranishi > Scientific Library Group > Cray, Inc. > keita at cray.com > ================================ > From sapphire.jxy at gmail.com Sun Dec 19 22:17:16 2010 From: sapphire.jxy at gmail.com (Xiaoyin Ji) Date: Sun, 19 Dec 2010 23:17:16 -0500 Subject: [petsc-users] self-defined preconditioner available? In-Reply-To: References: Message-ID: Hi, I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ksp solvers...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot. Regards, Xiaoyin Ji From hzhang at mcs.anl.gov Mon Dec 20 09:12:09 2010 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 20 Dec 2010 09:12:09 -0600 Subject: [petsc-users] self-defined preconditioner available? In-Reply-To: References: Message-ID: Xiaoyin: > > I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ksp solvers...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot. Yes, you can. See "Shell Preconditioners" in petsc user manual. Also, check out ~petsc/src/ksp/ksp/examples/tutorials/ex15.c as an example. Hong From sapphire.jxy at gmail.com Mon Dec 20 09:47:49 2010 From: sapphire.jxy at gmail.com (Xiaoyin Ji) Date: Mon, 20 Dec 2010 10:47:49 -0500 Subject: [petsc-users] self-defined preconditioner available? In-Reply-To: References: Message-ID: Hi, I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ks...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot. Regards, Xiaoyin Ji From knepley at gmail.com Mon Dec 20 09:51:47 2010 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Dec 2010 07:51:47 -0800 Subject: [petsc-users] self-defined preconditioner available? In-Reply-To: References: Message-ID: On Mon, Dec 20, 2010 at 7:47 AM, Xiaoyin Ji wrote: > Hi, > > I was wondering if petsc could convert an existing matrix into a > preconditioner and use it in ks...thought this shall be a straight-forward > question but I didn't find the answer in manuals. Thanks a lot. > http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/PC/PCSHELL.html Matt > Regards, > > Xiaoyin Ji -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Dec 20 10:41:07 2010 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 20 Dec 2010 10:41:07 -0600 Subject: [petsc-users] no PCFactorSetUseDropTolerance() in 3.1-p6 In-Reply-To: <4CFF3D10.6000600@in.tum.de> References: <4CFE4FFE.9010602@in.tum.de> <4CFF3D10.6000600@in.tum.de> Message-ID: Tobias : > Ah, ok. Is there a possibility to hardwire that in the source code? We run > different integration tests with different solver/pc combinations with one > single executable call (currently without any options). Yes. We've added MatSuperluSetILUDropTol() procedural call to petsc-dev. See http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html on how to get petsc-dev. ~petsc-dev/src/ksp/ksp/examples/tutorials/ex52.c is an example on how to use it. > Ok, thanks. Will this functionality be available also in the next release > (p7) and when is this expected (approximately)? It will be included in next petsc release (v3.2). When? early next year I guess :-) Hong > > Thanks and best regards > Tobias > >>> we recently switched from 3.0.0-p11 to 3.1-p6 and are now facing a minor >>> problem in some of our test cases: We use seqaij matrix format and GMRES >>> with ILU dt. >>> >>> Our code contained a statement using PCFactorSetUseDropTolerance() which >>> does not exist any longer. So we renamed the function call to >>> PCFactorSetDropTolerance (same signature) which has no online docu but >>> found >>> by google ;-) Obviously, this function has not the same functionality as >>> our >>> tests fail. >>> >>> In 3.0.0-p11, the docu of the "Summary of Sparse Linear Solvers Available >>> from PETSc" showed a line containing: >>> ILU dt: ILU dt ?seqaij ?Sparsekit (table survey) >>> >>> This is not included in the docu of 3.1-p6 >>> >>> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html >>> ) any more. Does that mean that the ILU dt version is not supported by >>> the >>> (default) petsc? I also checked the changelog but did not find anything >>> there; sorry if I missed sth. >>> >>> Any help or infos will be highly appreciated ;-) >>> >>> Thanks and best regards >>> Tobias >>> > From yjxd.chen at gmail.com Mon Dec 20 10:46:44 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Mon, 20 Dec 2010 17:46:44 +0100 Subject: [petsc-users] Very poor speed up performance Message-ID: Hi everyone, I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A and right hand vector b are read from files. The dimension of A is 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been read correctly. I compiled the program with optimized version (--with-debugging=0), tested the speed up performance on two servers, and I have found that the performance is very poor. For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 cores. On each of them, with the increasing of computing cores k from 1 to 8 (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up will increase from 1 to 6, but when the computing cores k increase from 9 to 16(for the first server) or 48 (for the second server), the speed up decrease firstly and then remains a constant value 5.0 (for the first server) or 4.5(for the second server). Actually, the program LAMMPS speed up excellently on these two servers. Any comments are very appreciated! Thanks! -------------------------------------------------------------------------------------------------------------------------- PS: the related codes are as following, //firstly read A and b from files ... //then ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = VecAssemblyBegin(b); CHKERRQ(ierr); ierr = VecAssemblyEnd(b); CHKERRQ(ierr); ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr); ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); ierr = KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); ierr = VecAssemblyBegin(x);CHKERRQ(ierr); ierr = VecAssemblyEnd(x);CHKERRQ(ierr); ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Dec 20 11:06:32 2010 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Dec 2010 09:06:32 -0800 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen wrote: > > Hi everyone, > > > I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A > and right hand vector b are read from files. The dimension of A is > 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been > read correctly. > > I compiled the program with optimized version (--with-debugging=0), tested > the speed up performance on two servers, and I have found that the > performance is very poor. > > For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total 16 > cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 > cores. > > On each of them, with the increasing of computing cores k from 1 to 8 > (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up > will increase from 1 to 6, but when the computing cores k increase from 9 to > 16(for the first server) or 48 (for the second server), the speed up > decrease firstly and then remains a constant value 5.0 (for the first > server) or 4.5(for the second server). > We cannot say anything at all without -log_summary data for your runs. Matt > Actually, the program LAMMPS speed up excellently on these two servers. > > Any comments are very appreciated! Thanks! > > > > > -------------------------------------------------------------------------------------------------------------------------- > > PS: the related codes are as following, > > > //firstly read A and b from files > > ... > > //then > > > > ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); > CHKERRQ(ierr); > > ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > ierr = VecAssemblyBegin(b); CHKERRQ(ierr); > > ierr = VecAssemblyEnd(b); CHKERRQ(ierr); > > > > ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); > CHKERRQ(ierr); > > ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > > > > ierr = > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); > > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > > ierr = > KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); > > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > > > > ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > > > ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > > ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); > > > > ierr = VecAssemblyBegin(x);CHKERRQ(ierr); > > ierr = VecAssemblyEnd(x);CHKERRQ(ierr); > > ... > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Dec 20 12:36:34 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 20 Dec 2010 12:36:34 -0600 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: <37E7E191-1C32-4082-A680-2B6ACD556895@mcs.anl.gov> See http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers in particular note the discussion on memory bandwidth. Once you have started using multiple cores per CPU you will start to see very little speedup with Jacobi preconditioning since it is very memory bandwidth limited. In fact pretty much all sparse iterative solvers are memory bandwidth limited. Barry On Dec 20, 2010, at 10:46 AM, Yongjun Chen wrote: > > Hi everyone, > > > > I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A and right hand vector b are read from files. The dimension of A is 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been read correctly. > > I compiled the program with optimized version (--with-debugging=0), tested the speed up performance on two servers, and I have found that the performance is very poor. > > For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 cores. > > On each of them, with the increasing of computing cores k from 1 to 8 (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up will increase from 1 to 6, but when the computing cores k increase from 9 to 16(for the first server) or 48 (for the second server), the speed up decrease firstly and then remains a constant value 5.0 (for the first server) or 4.5(for the second server). > > Actually, the program LAMMPS speed up excellently on these two servers. > > Any comments are very appreciated! Thanks! > > > -------------------------------------------------------------------------------------------------------------------------- > > PS: the related codes are as following, > > > > //firstly read A and b from files > > ... > > //then > > > ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > ierr = VecAssemblyBegin(b); CHKERRQ(ierr); > > ierr = VecAssemblyEnd(b); CHKERRQ(ierr); > > > ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr); > > ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > > > ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); > > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > > ierr = KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); > > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > > > ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > > ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); > > > ierr = VecAssemblyBegin(x);CHKERRQ(ierr); > > ierr = VecAssemblyEnd(x);CHKERRQ(ierr); > > ... > > > From yjxd.chen at gmail.com Mon Dec 20 12:38:31 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Mon, 20 Dec 2010 19:38:31 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: Hi Matt, Thanks for your reply. Just now I have carried out a series of tests with k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary option. From 8 cores to 12 cores, a small speed up has been found this time, but from 12 cores to 16 cores, the computation time increase! Attached please find these 5 log files. Thank you very much! mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg -log_summary Here, I use ksp bicg instead of gmres, because the two ksp gives almost the same speed up performance, as I have tried many times. ---------------------- (1) k=2 ---------------------- Process 1 of total 2 on wmss04 Process 0 of total 2 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 17:42:23 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.25862e-06 Norm of error 1.25862e-06, Iterations 1475 ========================================================= The solver has finished successfully! ========================================================= The solving time is 762.874 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 17:55:06 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny Mon Dec 20 18:55:06 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 8.160e+02 1.00000 8.160e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.120e+11 1.04720 3.050e+11 6.100e+11 Flops/sec: 3.824e+08 1.04720 3.737e+08 7.475e+08 MPI Messages: 2.958e+03 1.00068 2.958e+03 5.915e+03 MPI Message Lengths: 9.598e+08 1.00034 3.245e+05 1.919e+09 MPI Reductions: 4.483e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 8.1603e+02 100.0% 6.0997e+11 100.0% 5.915e+03 100.0% 3.245e+05 100.0% 4.467e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 41 47 50 50 0 41 47 50 50 0 846 MatMultTranspose 1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 42 47 50 50 0 42 47 50 50 0 846 MatAssemblyBegin 1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecDot 2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 3.0e+03 2 1 0 0 66 2 1 0 0 66 340 VecNorm 1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 287 VecCopy 4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 566 VecAYPX 2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 510 VecAssemblyBegin 6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 194 VecScatterBegin 2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 KSPSetup 1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 4.4e+03 92100100100 99 92100100100 99 811 PCSetUp 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 193 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 339744648 0 Vec 18 18 62239872 0 Vec Scatter 2 2 1736 0 Index Set 4 4 974736 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.21593e-06 Average time for MPI_Barrier(): 1.44005e-05 Average time for zero size MPI_Send(): 1.94311e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ ---------------------- (2) k=4 ---------------------- Process 0 of total 4 on wmss04 Process 2 of total 4 on wmss04 Process 3 of total 4 on wmss04 Process 1 of total 4 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 17:33:24 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28342e-06 Norm of error 1.28342e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 450.583 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 17:40:55 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Mon Dec 20 18:40:55 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 4.807e+02 1.00000 4.807e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11 Flops/sec: 3.241e+08 1.06872 3.168e+08 1.267e+09 MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04 MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.8066e+02 100.0% 6.0914e+11 100.0% 1.772e+04 100.0% 2.658e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 1494 MatMultTranspose 1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 40 47 50 50 0 40 47 50 50 0 1498 MatAssemblyBegin 1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecDot 2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03 3 1 0 0 66 3 1 0 0 66 274 VecNorm 1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 310 VecCopy 4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 732 VecAYPX 2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 610 VecAssemblyBegin 6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 202 VecScatterBegin 2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 KSPSetup 1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 91100100100 99 91100100100 99 1386 PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 201 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 169902696 0 Vec 18 18 31282096 0 Vec Scatter 2 2 1736 0 Index Set 4 4 638616 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.5974e-06 Average time for MPI_Barrier(): 3.48091e-05 Average time for zero size MPI_Send(): 1.8537e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- ---------------------- (3) k=8 ---------------------- Process 0 of total 8 on wmss04 Process 4 of total 8 on wmss04 Process 2 of total 8 on wmss04 Process 6 of total 8 on wmss04 Process 3 of total 8 on wmss04 Process 7 of total 8 on wmss04 Process 1 of total 8 on wmss04 Process 5 of total 8 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 18:14:59 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.32502e-06 Norm of error 1.32502e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 311.937 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 18:20:11 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Mon Dec 20 19:20:11 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.330e+02 1.00000 3.330e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 Flops/sec: 2.340e+08 1.09702 2.286e+08 1.829e+09 MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.3302e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50 0 38 47 50 50 0 2031 MatMultTranspose 1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50 0 38 47 50 50 0 2120 MatAssemblyBegin 1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 6 1 0 0 66 6 1 0 0 66 194 VecNorm 1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 428 VecCopy 4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1127 VecAYPX 2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1015 VecAssemblyBegin 6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 359 VecScatterBegin 2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99 90100100100 99 2024 PCSetUp 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 358 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 84944064 0 Vec 18 18 15741712 0 Vec Scatter 2 2 1736 0 Index Set 4 4 409008 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 3.38554e-06 Average time for MPI_Barrier(): 7.40051e-05 Average time for zero size MPI_Send(): 1.88947e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- ---------------------- (4) k=12 ---------------------- Process 1 of total 12 on wmss04 Process 5 of total 12 on wmss04 Process 2 of total 12 on wmss04 Process 9 of total 12 on wmss04 Process 6 of total 12 on wmss04 Process 7 of total 12 on wmss04 Process 10 of total 12 on wmss04 Process 3 of total 12 on wmss04 Process 11 of total 12 on wmss04 Process 4 of total 12 on wmss04 Process 8 of total 12 on wmss04 Process 0 of total 12 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 17:56:36 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28414e-06 Norm of error 1.28414e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 291.503 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 18:01:28 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Mon Dec 20 19:01:28 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.089e+02 1.00012 3.089e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 Flops/sec: 1.683e+08 1.11689 1.643e+08 1.971e+09 MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.0887e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2054 MatMultTranspose 1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 34 47 50 50 0 34 47 50 50 0 2175 MatAssemblyBegin 1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 13 1 0 0 66 13 1 0 0 66 60 VecNorm 1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 2 1 0 0 33 2 1 0 0 33 322 VecCopy 4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 964 VecAYPX 2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1041 VecAssemblyBegin 6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 395 VecScatterBegin 2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 KSPSetup 1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 91100100100 99 91100100100 99 2173 PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 393 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 56593044 0 Vec 18 18 10534536 0 Vec Scatter 2 2 1736 0 Index Set 4 4 305424 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 6.48499e-06 Average time for MPI_Barrier(): 0.000102377 Average time for zero size MPI_Send(): 2.15967e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- ---------------------- (5) k=16 ---------------------- Process 0 of total 16 on wmss04 Process 8 of total 16 on wmss04 Process 4 of total 16 on wmss04 Process 12 of total 16 on wmss04 Process 2 of total 16 on wmss04 Process 6 of total 16 on wmss04 Process 5 of total 16 on wmss04 Process 11 of total 16 on wmss04 Process 14 of total 16 on wmss04 Process 7 of total 16 on wmss04 Process Process 15 of total 16 on wmss04 3Process 13 of total 16 on wmss04 Process 10 of total 16 on wmss04 Process 9 of total 16 on wmss04 Process 1 of total 16 on wmss04 The dimension of Matrix A is n = 1177754 of total 16 on wmss04 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly.End Assembly. End Assembly.End Assembly.End Assembly.End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 18:02:28 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.15892e-06 Norm of error 1.15892e-06, Iterations 1497 ========================================================= The solver has finished successfully! ========================================================= The solving time is 337.91 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 18:08:06 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Mon Dec 20 19:08:06 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.534e+02 1.00001 3.534e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.964e+10 1.13060 3.864e+10 6.182e+11 Flops/sec: 1.122e+08 1.13060 1.093e+08 1.749e+09 MPI Messages: 1.200e+04 3.99917 7.127e+03 1.140e+05 MPI Message Lengths: 1.950e+09 7.80999 1.819e+05 2.074e+10 MPI Reductions: 4.549e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.5342e+02 100.0% 6.1820e+11 100.0% 1.140e+05 100.0% 1.819e+05 100.0% 4.533e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 40 47 50 50 0 40 47 50 50 0 1555 MatMultTranspose 1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2069 MatAssemblyBegin 1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 10 1 0 0 66 10 1 0 0 66 104 VecNorm 1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 1.5e+03 2 1 0 0 33 2 1 0 0 33 263 VecCopy 4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 931 VecAYPX 2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 962 VecAssemblyBegin 6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 360 VecScatterBegin 2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22 0 0 0 0 22 0 0 0 0 0 KSPSetup 1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 92100100100 99 92100100100 99 1893 PCSetUp 1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 359 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 42424600 0 Vec 18 18 7924896 0 Vec Scatter 2 2 1736 0 Index Set 4 4 247632 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 6.10352e-06 Average time for MPI_Barrier(): 0.000129986 Average time for zero size MPI_Send(): 2.08169e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley wrote: > On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen wrote: > >> >> Hi everyone, >> >> >> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A >> and right hand vector b are read from files. The dimension of A is >> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been >> read correctly. >> >> I compiled the program with optimized version (--with-debugging=0), tested >> the speed up performance on two servers, and I have found that the >> performance is very poor. >> >> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total >> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 >> cores. >> >> On each of them, with the increasing of computing cores k from 1 to 8 >> (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up >> will increase from 1 to 6, but when the computing cores k increase from 9 to >> 16(for the first server) or 48 (for the second server), the speed up >> decrease firstly and then remains a constant value 5.0 (for the first >> server) or 4.5(for the second server). >> > > We cannot say anything at all without -log_summary data for your runs. > > Matt > > >> Actually, the program LAMMPS speed up excellently on these two servers. >> >> Any comments are very appreciated! Thanks! >> >> >> >> >> -------------------------------------------------------------------------------------------------------------------------- >> >> PS: the related codes are as following, >> >> >> //firstly read A and b from files >> >> ... >> >> //then >> >> >> >> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >> CHKERRQ(ierr); >> >> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> >> ierr = VecAssemblyBegin(b); CHKERRQ(ierr); >> >> ierr = VecAssemblyEnd(b); CHKERRQ(ierr); >> >> >> >> ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); >> CHKERRQ(ierr); >> >> ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); >> >> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >> >> >> >> ierr = >> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); >> >> ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); >> >> ierr = >> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); >> >> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >> >> >> >> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); >> >> >> >> ierr = >> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >> >> >> >> ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); >> >> >> >> ierr = VecAssemblyBegin(x);CHKERRQ(ierr); >> >> ierr = VecAssemblyEnd(x);CHKERRQ(ierr); >> >> ... >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- Dr.Yongjun Chen Room 2507, Building M Institute of Materials Science and Technology Technical University of Hamburg-Harburg Ei?endorfer Stra?e 42, 21073 Hamburg, Germany. Tel: +49 (0)40-42878-4386 Fax: +49 (0)40-42878-4070 E-mail: yjxd.chen at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Process 0 of total 16 on wmss04 Process 8 of total 16 on wmss04 Process 4 of total 16 on wmss04 Process 12 of total 16 on wmss04 Process 2 of total 16 on wmss04 Process 6 of total 16 on wmss04 Process 5 of total 16 on wmss04 Process 11 of total 16 on wmss04 Process 14 of total 16 on wmss04 Process 7 of total 16 on wmss04 Process Process 15 of total 16 on wmss04 3Process 13 of total 16 on wmss04 Process 10 of total 16 on wmss04 Process 9 of total 16 on wmss04 Process 1 of total 16 on wmss04 The dimension of Matrix A is n = 1177754 of total 16 on wmss04 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly.End Assembly. End Assembly.End Assembly.End Assembly.End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 18:02:28 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.15892e-06 Norm of error 1.15892e-06, Iterations 1497 ========================================================= The solver has finished successfully! ========================================================= The solving time is 337.91 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 18:08:06 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Mon Dec 20 19:08:06 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.534e+02 1.00001 3.534e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.964e+10 1.13060 3.864e+10 6.182e+11 Flops/sec: 1.122e+08 1.13060 1.093e+08 1.749e+09 MPI Messages: 1.200e+04 3.99917 7.127e+03 1.140e+05 MPI Message Lengths: 1.950e+09 7.80999 1.819e+05 2.074e+10 MPI Reductions: 4.549e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.5342e+02 100.0% 6.1820e+11 100.0% 1.140e+05 100.0% 1.819e+05 100.0% 4.533e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 40 47 50 50 0 40 47 50 50 0 1555 MatMultTranspose 1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2069 MatAssemblyBegin 1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 10 1 0 0 66 10 1 0 0 66 104 VecNorm 1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 1.5e+03 2 1 0 0 33 2 1 0 0 33 263 VecCopy 4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 931 VecAYPX 2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 962 VecAssemblyBegin 6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 360 VecScatterBegin 2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22 0 0 0 0 22 0 0 0 0 0 KSPSetup 1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 92100100100 99 92100100100 99 1893 PCSetUp 1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 359 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 42424600 0 Vec 18 18 7924896 0 Vec Scatter 2 2 1736 0 Index Set 4 4 247632 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 6.10352e-06 Average time for MPI_Barrier(): 0.000129986 Average time for zero size MPI_Send(): 2.08169e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 1 of total 2 on wmss04 Process 0 of total 2 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 17:42:23 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.25862e-06 Norm of error 1.25862e-06, Iterations 1475 ========================================================= The solver has finished successfully! ========================================================= The solving time is 762.874 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 17:55:06 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny Mon Dec 20 18:55:06 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 8.160e+02 1.00000 8.160e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.120e+11 1.04720 3.050e+11 6.100e+11 Flops/sec: 3.824e+08 1.04720 3.737e+08 7.475e+08 MPI Messages: 2.958e+03 1.00068 2.958e+03 5.915e+03 MPI Message Lengths: 9.598e+08 1.00034 3.245e+05 1.919e+09 MPI Reductions: 4.483e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 8.1603e+02 100.0% 6.0997e+11 100.0% 5.915e+03 100.0% 3.245e+05 100.0% 4.467e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 41 47 50 50 0 41 47 50 50 0 846 MatMultTranspose 1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 0.0e+00 42 47 50 50 0 42 47 50 50 0 846 MatAssemblyBegin 1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecDot 2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 3.0e+03 2 1 0 0 66 2 1 0 0 66 340 VecNorm 1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 287 VecCopy 4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 566 VecAYPX 2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 510 VecAssemblyBegin 6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 194 VecScatterBegin 2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 KSPSetup 1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 4.4e+03 92100100100 99 92100100100 99 811 PCSetUp 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 193 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 339744648 0 Vec 18 18 62239872 0 Vec Scatter 2 2 1736 0 Index Set 4 4 974736 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.21593e-06 Average time for MPI_Barrier(): 1.44005e-05 Average time for zero size MPI_Send(): 1.94311e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 4 on wmss04 Process 2 of total 4 on wmss04 Process 3 of total 4 on wmss04 Process 1 of total 4 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 17:33:24 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28342e-06 Norm of error 1.28342e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 450.583 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 17:40:55 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Mon Dec 20 18:40:55 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 4.807e+02 1.00000 4.807e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11 Flops/sec: 3.241e+08 1.06872 3.168e+08 1.267e+09 MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04 MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.8066e+02 100.0% 6.0914e+11 100.0% 1.772e+04 100.0% 2.658e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 1494 MatMultTranspose 1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 40 47 50 50 0 40 47 50 50 0 1498 MatAssemblyBegin 1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecDot 2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03 3 1 0 0 66 3 1 0 0 66 274 VecNorm 1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 310 VecCopy 4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 732 VecAYPX 2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 610 VecAssemblyBegin 6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 202 VecScatterBegin 2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 KSPSetup 1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 91100100100 99 91100100100 99 1386 PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 201 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 169902696 0 Vec 18 18 31282096 0 Vec Scatter 2 2 1736 0 Index Set 4 4 638616 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.5974e-06 Average time for MPI_Barrier(): 3.48091e-05 Average time for zero size MPI_Send(): 1.8537e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 8 on wmss04 Process 4 of total 8 on wmss04 Process 2 of total 8 on wmss04 Process 6 of total 8 on wmss04 Process 3 of total 8 on wmss04 Process 7 of total 8 on wmss04 Process 1 of total 8 on wmss04 Process 5 of total 8 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 18:14:59 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.32502e-06 Norm of error 1.32502e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 311.937 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 18:20:11 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Mon Dec 20 19:20:11 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.330e+02 1.00000 3.330e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 Flops/sec: 2.340e+08 1.09702 2.286e+08 1.829e+09 MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.3302e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50 0 38 47 50 50 0 2031 MatMultTranspose 1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 38 47 50 50 0 38 47 50 50 0 2120 MatAssemblyBegin 1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 6 1 0 0 66 6 1 0 0 66 194 VecNorm 1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 428 VecCopy 4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1127 VecAYPX 2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1015 VecAssemblyBegin 6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 359 VecScatterBegin 2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99 90100100100 99 2024 PCSetUp 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 358 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 84944064 0 Vec 18 18 15741712 0 Vec Scatter 2 2 1736 0 Index Set 4 4 409008 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 3.38554e-06 Average time for MPI_Barrier(): 7.40051e-05 Average time for zero size MPI_Send(): 1.88947e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 1 of total 12 on wmss04 Process 5 of total 12 on wmss04 Process 2 of total 12 on wmss04 Process 9 of total 12 on wmss04 Process 6 of total 12 on wmss04 Process 7 of total 12 on wmss04 Process 10 of total 12 on wmss04 Process 3 of total 12 on wmss04 Process 11 of total 12 on wmss04 Process 4 of total 12 on wmss04 Process 8 of total 12 on wmss04 Process 0 of total 12 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Mon Dec 20 17:56:36 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28414e-06 Norm of error 1.28414e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 291.503 seconds. The time accuracy is 1e-06 second. The current time is Mon Dec 20 18:01:28 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Mon Dec 20 19:01:28 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.089e+02 1.00012 3.089e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 Flops/sec: 1.683e+08 1.11689 1.643e+08 1.971e+09 MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.0887e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2054 MatMultTranspose 1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 34 47 50 50 0 34 47 50 50 0 2175 MatAssemblyBegin 1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 13 1 0 0 66 13 1 0 0 66 60 VecNorm 1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 2 1 0 0 33 2 1 0 0 33 322 VecCopy 4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 964 VecAYPX 2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1041 VecAssemblyBegin 6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 395 VecScatterBegin 2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 KSPSetup 1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 91100100100 99 91100100100 99 2173 PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 393 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 56593044 0 Vec 18 18 10534536 0 Vec Scatter 2 2 1736 0 Index Set 4 4 305424 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 6.48499e-06 Average time for MPI_Barrier(): 0.000102377 Average time for zero size MPI_Send(): 2.15967e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Nov 23 15:54:45 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ From bsmith at mcs.anl.gov Mon Dec 20 12:52:00 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 20 Dec 2010 12:52:00 -0600 Subject: [petsc-users] self-defined preconditioner available? In-Reply-To: References: Message-ID: <24441805-6855-4ACC-9C47-A897E4C3F463@mcs.anl.gov> It is unclear what you mean. Do you mean use the MatMult() for an existing matrix as a preconditioner application for some other matrix? If so then use PCType of PCMAT and call KSPSetOperators() or SNESSetJacobian() with two different matrices the first that defines the linear system and the second that you wish to use directly as the preconditioner. See the manual page for PCMAT. If you mean something else please explain. Barry On Dec 19, 2010, at 10:17 PM, Xiaoyin Ji wrote: > Hi, > > I was wondering if petsc could convert an existing matrix into a preconditioner and use it in ksp solvers...thought this shall be a straight-forward question but I didn't find the answer in manuals. Thanks a lot. > > Regards, > > Xiaoyin Ji From knepley at gmail.com Mon Dec 20 13:21:17 2010 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Dec 2010 11:21:17 -0800 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: On Mon, Dec 20, 2010 at 10:38 AM, Yongjun Chen wrote: > Hi Matt, > > Thanks for your reply. Just now I have carried out a series of tests with > k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary > option. From 8 cores to 12 cores, a small speed up has been found this time, > but from 12 cores to 16 cores, the computation time increase! > Attached please find these 5 log files. Thank you very much! > Its very clear from these, but Barry was right in his reply. These are memory bandwidth limited computations, so if you don't get any more bandwidth you will not speed up. This is rarely mentioned in sales pitches for multicore computers. LAMMPS is not limited by bandwidth for most computations. Matt > mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg > -log_summary > Here, I use ksp bicg instead of gmres, because the two ksp gives almost the > same speed up performance, as I have tried many times. > ---------------------- > (1) k=2 > ---------------------- > Process 1 of total 2 on wmss04 > Process 0 of total 2 on wmss04 > The dimension of Matrix A is n = 1177754 > Begin Assembly: > Begin Assembly: > End Assembly. > End Assembly. > ========================================================= > Begin the solving: > ========================================================= > The current time is: Mon Dec 20 17:42:23 2010 > > KSP Object: > type: bicg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-07, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: > type: jacobi > linear system matrix = precond matrix: > Matrix Object: > type=mpisbaij, rows=1177754, cols=1177754 > total: nonzeros=49908476, allocated nonzeros=49908476 > block size is 1 > > norm(b-Ax)=1.25862e-06 > Norm of error 1.25862e-06, Iterations 1475 > ========================================================= > The solver has finished successfully! > ========================================================= > The solving time is 762.874 seconds. > The time accuracy is 1e-06 second. > The current time is Mon Dec 20 17:55:06 2010 > > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny > Mon Dec 20 18:55:06 2010 > Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > > Max Max/Min Avg Total > Time (sec): 8.160e+02 1.00000 8.160e+02 > Objects: 3.000e+01 1.00000 3.000e+01 > Flops: 3.120e+11 1.04720 3.050e+11 6.100e+11 > Flops/sec: 3.824e+08 1.04720 3.737e+08 7.475e+08 > MPI Messages: 2.958e+03 1.00068 2.958e+03 5.915e+03 > MPI Message Lengths: 9.598e+08 1.00034 3.245e+05 1.919e+09 > MPI Reductions: 4.483e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 8.1603e+02 100.0% 6.0997e+11 100.0% 5.915e+03 > 100.0% 3.245e+05 100.0% 4.467e+03 99.6% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 > 0.0e+00 41 47 50 50 0 41 47 50 50 0 846 > MatMultTranspose 1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 > 0.0e+00 42 47 50 50 0 42 47 50 50 0 846 > MatAssemblyBegin 1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 > 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > MatView 1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecView 1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecDot 2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 > 3.0e+03 2 1 0 0 66 2 1 0 0 66 340 > VecNorm 1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 > 1.5e+03 1 1 0 0 33 1 1 0 0 33 287 > VecCopy 4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 2 0 0 0 2 2 0 0 0 566 > VecAYPX 2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 510 > VecAssemblyBegin 6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecPointwiseMult 2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 194 > VecScatterBegin 2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 > 0.0e+00 0 0100100 0 0 0100100 0 0 > VecScatterEnd 2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > KSPSetup 1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 > 4.4e+03 92100100100 99 92100100100 99 811 > PCSetUp 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 193 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 3 3 339744648 0 > Vec 18 18 62239872 0 > Vec Scatter 2 2 1736 0 > Index Set 4 4 974736 0 > Krylov Solver 1 1 832 0 > Preconditioner 1 1 872 0 > Viewer 1 1 544 0 > > ======================================================================================================================== > Average time to get PetscTime(): 1.21593e-06 > Average time for MPI_Barrier(): 1.44005e-05 > Average time for zero size MPI_Send(): 1.94311e-05 > #PETSc Option Table entries: > -ksp_type bicg > -log_summary > -pc_type jacobi > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 > Configure run at: Tue Nov 23 15:54:45 2010 > Configure options: --known-level1-dcache-size=65536 > --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 > --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 > --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 > --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 > --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc > --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 > --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 > --download-parmetis=1 --download-mumps=1 --download-scalapack=1 > --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch > --known-mpi-shared=1 > ----------------------------------------- > Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 > Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 > 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux > Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized > Using PETSc arch: linux-gnu-c-opt > ----------------------------------------- > Using C compiler: > /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall > -Wwrite-strings -Wno-strict-aliasing -O > Using Fortran compiler: > /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall > -Wno-unused-variable -O > ----------------------------------------- > Using include paths: > -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include > -I/sun42/cheny/petsc-3.1-p5-optimized/include > -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include > ------------------------------------------ > Using C linker: > /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall > -Wwrite-strings -Wno-strict-aliasing -O > Using Fortran linker: > /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall > -Wno-unused-variable -O > Using libraries: > -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib > -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc > -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib > -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx > -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord > -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt > -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib > -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 > -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib > -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t > -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib > -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 > -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich > -lpthread -lrt -lgcc_s -ldl > ------------------------------------------ > > > ---------------------- > (2) k=4 > ---------------------- > Process 0 of total 4 on wmss04 > Process 2 of total 4 on wmss04 > Process 3 of total 4 on wmss04 > Process 1 of total 4 on wmss04 > The dimension of Matrix A is n = 1177754 > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > End Assembly. > End Assembly. > End Assembly. > End Assembly. > ========================================================= > Begin the solving: > ========================================================= > The current time is: Mon Dec 20 17:33:24 2010 > > KSP Object: > type: bicg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-07, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: > type: jacobi > linear system matrix = precond matrix: > Matrix Object: > type=mpisbaij, rows=1177754, cols=1177754 > total: nonzeros=49908476, allocated nonzeros=49908476 > block size is 1 > > norm(b-Ax)=1.28342e-06 > Norm of error 1.28342e-06, Iterations 1473 > ========================================================= > The solver has finished successfully! > ========================================================= > The solving time is 450.583 seconds. > The time accuracy is 1e-06 second. > The current time is Mon Dec 20 17:40:55 2010 > > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny > Mon Dec 20 18:40:55 2010 > Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > > Max Max/Min Avg Total > Time (sec): 4.807e+02 1.00000 4.807e+02 > Objects: 3.000e+01 1.00000 3.000e+01 > Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11 > Flops/sec: 3.241e+08 1.06872 3.168e+08 1.267e+09 > MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04 > MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09 > MPI Reductions: 4.477e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 4.8066e+02 100.0% 6.0914e+11 100.0% 1.772e+04 > 100.0% 2.658e+05 100.0% 4.461e+03 99.6% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 > 0.0e+00 39 47 50 50 0 39 47 50 50 0 1494 > MatMultTranspose 1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 > 0.0e+00 40 47 50 50 0 40 47 50 50 0 1498 > MatAssemblyBegin 1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 > 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > MatView 1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecView 1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecDot 2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 > 2.9e+03 3 1 0 0 66 3 1 0 0 66 274 > VecNorm 1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 > 1.5e+03 1 1 0 0 33 1 1 0 0 33 310 > VecCopy 4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 2 0 0 0 3 2 0 0 0 732 > VecAYPX 2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 610 > VecAssemblyBegin 6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecPointwiseMult 2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 1 0 0 0 3 1 0 0 0 202 > VecScatterBegin 2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 > 0.0e+00 0 0100100 0 0 0100100 0 0 > VecScatterEnd 2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 > KSPSetup 1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 > 4.4e+03 91100100100 99 91100100100 99 1386 > PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 1 0 0 0 3 1 0 0 0 201 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 3 3 169902696 0 > Vec 18 18 31282096 0 > Vec Scatter 2 2 1736 0 > Index Set 4 4 638616 0 > Krylov Solver 1 1 832 0 > Preconditioner 1 1 872 0 > Viewer 1 1 544 0 > > ======================================================================================================================== > Average time to get PetscTime(): 1.5974e-06 > Average time for MPI_Barrier(): 3.48091e-05 > Average time for zero size MPI_Send(): 1.8537e-05 > #PETSc Option Table entries: > -ksp_type bicg > -log_summary > -pc_type jacobi > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 > Configure run at: Tue Nov 23 15:54:45 2010 > Configure options: --known-level1-dcache-size=65536 > --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 > --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 > --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 > --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 > --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc > --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 > --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 > --download-parmetis=1 --download-mumps=1 --download-scalapack=1 > --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch > --known-mpi-shared=1 > ----------------------------------------- > > > > ---------------------- > (3) k=8 > ---------------------- > Process 0 of total 8 on wmss04 > Process 4 of total 8 on wmss04 > Process 2 of total 8 on wmss04 > Process 6 of total 8 on wmss04 > Process 3 of total 8 on wmss04 > Process 7 of total 8 on wmss04 > Process 1 of total 8 on wmss04 > Process 5 of total 8 on wmss04 > The dimension of Matrix A is n = 1177754 > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly. > ========================================================= > Begin the solving: > ========================================================= > The current time is: Mon Dec 20 18:14:59 2010 > > KSP Object: > type: bicg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-07, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: > type: jacobi > linear system matrix = precond matrix: > Matrix Object: > type=mpisbaij, rows=1177754, cols=1177754 > total: nonzeros=49908476, allocated nonzeros=49908476 > block size is 1 > > norm(b-Ax)=1.32502e-06 > Norm of error 1.32502e-06, Iterations 1473 > ========================================================= > The solver has finished successfully! > ========================================================= > The solving time is 311.937 seconds. > The time accuracy is 1e-06 second. > The current time is Mon Dec 20 18:20:11 2010 > > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny > Mon Dec 20 19:20:11 2010 > Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > > Max Max/Min Avg Total > Time (sec): 3.330e+02 1.00000 3.330e+02 > Objects: 3.000e+01 1.00000 3.000e+01 > Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 > Flops/sec: 2.340e+08 1.09702 2.286e+08 1.829e+09 > MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 > MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 > MPI Reductions: 4.477e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 3.3302e+02 100.0% 6.0914e+11 100.0% 4.135e+04 > 100.0% 2.430e+05 100.0% 4.461e+03 99.6% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 > 0.0e+00 38 47 50 50 0 38 47 50 50 0 2031 > MatMultTranspose 1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 > 0.0e+00 38 47 50 50 0 38 47 50 50 0 2120 > MatAssemblyBegin 1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 > 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > MatView 1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecView 1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecDot 2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 > 2.9e+03 6 1 0 0 66 6 1 0 0 66 194 > VecNorm 1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 > 1.5e+03 1 1 0 0 33 1 1 0 0 33 428 > VecCopy 4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 2 0 0 0 2 2 0 0 0 1127 > VecAYPX 2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 1015 > VecAssemblyBegin 6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecPointwiseMult 2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 1 0 0 0 3 1 0 0 0 359 > VecScatterBegin 2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 > 0.0e+00 1 0100100 0 1 0100100 0 0 > VecScatterEnd 2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 > KSPSetup 1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 > 4.4e+03 90100100100 99 90100100100 99 2024 > PCSetUp 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 1 0 0 0 3 1 0 0 0 358 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 3 3 84944064 0 > Vec 18 18 15741712 0 > Vec Scatter 2 2 1736 0 > Index Set 4 4 409008 0 > Krylov Solver 1 1 832 0 > Preconditioner 1 1 872 0 > Viewer 1 1 544 0 > > ======================================================================================================================== > Average time to get PetscTime(): 3.38554e-06 > Average time for MPI_Barrier(): 7.40051e-05 > Average time for zero size MPI_Send(): 1.88947e-05 > #PETSc Option Table entries: > -ksp_type bicg > -log_summary > -pc_type jacobi > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 > Configure run at: Tue Nov 23 15:54:45 2010 > Configure options: --known-level1-dcache-size=65536 > --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 > --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 > --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 > --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 > --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc > --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 > --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 > --download-parmetis=1 --download-mumps=1 --download-scalapack=1 > --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch > --known-mpi-shared=1 > ----------------------------------------- > > > > ---------------------- > (4) k=12 > ---------------------- > Process 1 of total 12 on wmss04 > Process 5 of total 12 on wmss04 > Process 2 of total 12 on wmss04 > Process 9 of total 12 on wmss04 > Process 6 of total 12 on wmss04 > Process 7 of total 12 on wmss04 > Process 10 of total 12 on wmss04 > Process 3 of total 12 on wmss04 > Process 11 of total 12 on wmss04 > Process 4 of total 12 on wmss04 > Process 8 of total 12 on wmss04 > Process 0 of total 12 on wmss04 > The dimension of Matrix A is n = 1177754 > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly. > End Assembly.End Assembly. > End Assembly. > End Assembly. > > End Assembly. > End Assembly. > ========================================================= > Begin the solving: > ========================================================= > The current time is: Mon Dec 20 17:56:36 2010 > > KSP Object: > type: bicg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-07, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: > type: jacobi > linear system matrix = precond matrix: > Matrix Object: > type=mpisbaij, rows=1177754, cols=1177754 > total: nonzeros=49908476, allocated nonzeros=49908476 > block size is 1 > > norm(b-Ax)=1.28414e-06 > Norm of error 1.28414e-06, Iterations 1473 > ========================================================= > The solver has finished successfully! > ========================================================= > The solving time is 291.503 seconds. > The time accuracy is 1e-06 second. > The current time is Mon Dec 20 18:01:28 2010 > > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny > Mon Dec 20 19:01:28 2010 > Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > > Max Max/Min Avg Total > Time (sec): 3.089e+02 1.00012 3.089e+02 > Objects: 3.000e+01 1.00000 3.000e+01 > Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 > Flops/sec: 1.683e+08 1.11689 1.643e+08 1.971e+09 > MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 > MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 > MPI Reductions: 4.477e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 3.0887e+02 100.0% 6.0890e+11 100.0% 6.498e+04 > 100.0% 2.345e+05 100.0% 4.461e+03 99.6% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 > 0.0e+00 35 47 50 50 0 35 47 50 50 0 2054 > MatMultTranspose 1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 > 0.0e+00 34 47 50 50 0 34 47 50 50 0 2175 > MatAssemblyBegin 1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 > 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > MatView 1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecView 1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecDot 2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 > 2.9e+03 13 1 0 0 66 13 1 0 0 66 60 > VecNorm 1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 > 1.5e+03 2 1 0 0 33 2 1 0 0 33 322 > VecCopy 4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 2 0 0 0 2 2 0 0 0 964 > VecAYPX 2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 1041 > VecAssemblyBegin 6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecPointwiseMult 2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 395 > VecScatterBegin 2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 > 0.0e+00 1 0100100 0 1 0100100 0 0 > VecScatterEnd 2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 > KSPSetup 1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 > 4.4e+03 91100100100 99 91100100100 99 2173 > PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 393 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 3 3 56593044 0 > Vec 18 18 10534536 0 > Vec Scatter 2 2 1736 0 > Index Set 4 4 305424 0 > Krylov Solver 1 1 832 0 > Preconditioner 1 1 872 0 > Viewer 1 1 544 0 > > ======================================================================================================================== > Average time to get PetscTime(): 6.48499e-06 > Average time for MPI_Barrier(): 0.000102377 > Average time for zero size MPI_Send(): 2.15967e-05 > #PETSc Option Table entries: > -ksp_type bicg > -log_summary > -pc_type jacobi > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 > Configure run at: Tue Nov 23 15:54:45 2010 > Configure options: --known-level1-dcache-size=65536 > --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 > --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 > --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 > --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 > --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc > --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 > --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 > --download-parmetis=1 --download-mumps=1 --download-scalapack=1 > --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch > --known-mpi-shared=1 > ----------------------------------------- > > > ---------------------- > (5) k=16 > ---------------------- > Process 0 of total 16 on wmss04 > Process 8 of total 16 on wmss04 > Process 4 of total 16 on wmss04 > Process 12 of total 16 on wmss04 > Process 2 of total 16 on wmss04 > Process 6 of total 16 on wmss04 > Process 5 of total 16 on wmss04 > Process 11 of total 16 on wmss04 > Process 14 of total 16 on wmss04 > Process 7 of total 16 on wmss04 > Process Process 15 of total 16 on wmss04 > 3Process 13 of total 16 on wmss04 > Process 10 of total 16 on wmss04 > Process 9 of total 16 on wmss04 > Process 1 of total 16 on wmss04 > The dimension of Matrix A is n = 1177754 > of total 16 on wmss04 > > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > > Begin Assembly: > Begin Assembly: > Begin Assembly: > Begin Assembly: > > Begin Assembly: > Begin Assembly: > Begin Assembly: > > Begin Assembly: > Begin Assembly: > End Assembly. > End Assembly.End Assembly. > End Assembly.End Assembly.End Assembly.End Assembly. > End Assembly. > End Assembly. > End Assembly.End Assembly. > > End Assembly. > End Assembly. > End Assembly. > End Assembly.End Assembly. > > > > ========================================================= > Begin the solving: > ========================================================= > The current time is: Mon Dec 20 18:02:28 2010 > > KSP Object: > type: bicg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-07, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: > type: jacobi > linear system matrix = precond matrix: > Matrix Object: > type=mpisbaij, rows=1177754, cols=1177754 > total: nonzeros=49908476, allocated nonzeros=49908476 > block size is 1 > > norm(b-Ax)=1.15892e-06 > Norm of error 1.15892e-06, Iterations 1497 > ========================================================= > The solver has finished successfully! > ========================================================= > The solving time is 337.91 seconds. > The time accuracy is 1e-06 second. > The current time is Mon Dec 20 18:08:06 2010 > > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny > Mon Dec 20 19:08:06 2010 > Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 > > Max Max/Min Avg Total > Time (sec): 3.534e+02 1.00001 3.534e+02 > Objects: 3.000e+01 1.00000 3.000e+01 > Flops: 3.964e+10 1.13060 3.864e+10 6.182e+11 > Flops/sec: 1.122e+08 1.13060 1.093e+08 1.749e+09 > MPI Messages: 1.200e+04 3.99917 7.127e+03 1.140e+05 > MPI Message Lengths: 1.950e+09 7.80999 1.819e+05 2.074e+10 > MPI Reductions: 4.549e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 3.5342e+02 100.0% 6.1820e+11 100.0% 1.140e+05 > 100.0% 1.819e+05 100.0% 4.533e+03 99.6% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 > 0.0e+00 40 47 50 50 0 40 47 50 50 0 1555 > MatMultTranspose 1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 > 0.0e+00 35 47 50 50 0 35 47 50 50 0 2069 > MatAssemblyBegin 1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 > 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > MatView 1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecView 1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecDot 2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 > 3.0e+03 10 1 0 0 66 10 1 0 0 66 104 > VecNorm 1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 > 1.5e+03 2 1 0 0 33 2 1 0 0 33 263 > VecCopy 4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 2 0 0 0 2 2 0 0 0 931 > VecAYPX 2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 1 0 0 0 1 1 0 0 0 962 > VecAssemblyBegin 6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecPointwiseMult 2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 360 > VecScatterBegin 2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 > 0.0e+00 1 0100100 0 1 0100100 0 0 > VecScatterEnd 2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 22 0 0 0 0 22 0 0 0 0 0 > KSPSetup 1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 > 4.5e+03 92100100100 99 92100100100 99 1893 > PCSetUp 1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 359 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 3 3 42424600 0 > Vec 18 18 7924896 0 > Vec Scatter 2 2 1736 0 > Index Set 4 4 247632 0 > Krylov Solver 1 1 832 0 > Preconditioner 1 1 872 0 > Viewer 1 1 544 0 > > ======================================================================================================================== > Average time to get PetscTime(): 6.10352e-06 > Average time for MPI_Barrier(): 0.000129986 > Average time for zero size MPI_Send(): 2.08169e-05 > #PETSc Option Table entries: > -ksp_type bicg > -log_summary > -pc_type jacobi > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 > Configure run at: Tue Nov 23 15:54:45 2010 > Configure options: --known-level1-dcache-size=65536 > --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 > --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 > --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 > --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 > --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc > --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 > --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 > --download-parmetis=1 --download-mumps=1 --download-scalapack=1 > --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch > --known-mpi-shared=1 > ----------------------------------------- > > > > > On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley wrote: > >> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen wrote: >> >>> >>> Hi everyone, >>> >>> >>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix A >>> and right hand vector b are read from files. The dimension of A is >>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been >>> read correctly. >>> >>> I compiled the program with optimized version (--with-debugging=0), >>> tested the speed up performance on two servers, and I have found that the >>> performance is very poor. >>> >>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total >>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 >>> cores. >>> >>> On each of them, with the increasing of computing cores k from 1 to 8 >>> (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up >>> will increase from 1 to 6, but when the computing cores k increase from 9 to >>> 16(for the first server) or 48 (for the second server), the speed up >>> decrease firstly and then remains a constant value 5.0 (for the first >>> server) or 4.5(for the second server). >>> >> >> We cannot say anything at all without -log_summary data for your runs. >> >> Matt >> >> >>> Actually, the program LAMMPS speed up excellently on these two servers. >>> >>> Any comments are very appreciated! Thanks! >>> >>> >>> >>> >>> -------------------------------------------------------------------------------------------------------------------------- >>> >>> PS: the related codes are as following, >>> >>> >>> //firstly read A and b from files >>> >>> ... >>> >>> //then >>> >>> >>> >>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>> CHKERRQ(ierr); >>> >>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>> CHKERRQ(ierr); >>> >>> ierr = VecAssemblyBegin(b); CHKERRQ(ierr); >>> >>> ierr = VecAssemblyEnd(b); CHKERRQ(ierr); >>> >>> >>> >>> ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); >>> CHKERRQ(ierr); >>> >>> ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); >>> >>> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >>> >>> >>> >>> ierr = >>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); >>> >>> ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); >>> >>> ierr = >>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); >>> >>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >>> >>> >>> >>> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); >>> >>> >>> >>> ierr = >>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>> >>> >>> >>> ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); >>> >>> >>> >>> ierr = VecAssemblyBegin(x);CHKERRQ(ierr); >>> >>> ierr = VecAssemblyEnd(x);CHKERRQ(ierr); >>> >>> ... >>> >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > Dr.Yongjun Chen > Room 2507, Building M > Institute of Materials Science and Technology > Technical University of Hamburg-Harburg > Ei?endorfer Stra?e 42, 21073 Hamburg, Germany. > Tel: +49 (0)40-42878-4386 > Fax: +49 (0)40-42878-4070 > E-mail: yjxd.chen at gmail.com > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From yjxd.chen at gmail.com Mon Dec 20 15:59:20 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Mon, 20 Dec 2010 22:59:20 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and see what I can get. Yongjun On Mon, Dec 20, 2010 at 8:21 PM, Matthew Knepley wrote: > On Mon, Dec 20, 2010 at 10:38 AM, Yongjun Chen wrote: > >> Hi Matt, >> >> Thanks for your reply. Just now I have carried out a series of tests with >> k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary >> option. From 8 cores to 12 cores, a small speed up has been found this time, >> but from 12 cores to 16 cores, the computation time increase! >> Attached please find these 5 log files. Thank you very much! >> > > Its very clear from these, but Barry was right in his reply. These are > memory bandwidth limited > computations, so if you don't get any more bandwidth you will not speed up. > This is rarely mentioned > in sales pitches for multicore computers. LAMMPS is not limited by > bandwidth for most computations. > > Matt > > >> mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg >> -log_summary >> Here, I use ksp bicg instead of gmres, because the two ksp gives almost >> the same speed up performance, as I have tried many times. >> ---------------------- >> (1) k=2 >> ---------------------- >> Process 1 of total 2 on wmss04 >> Process 0 of total 2 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 17:42:23 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.25862e-06 >> Norm of error 1.25862e-06, Iterations 1475 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 762.874 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 17:55:06 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny >> Mon Dec 20 18:55:06 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 8.160e+02 1.00000 8.160e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 3.120e+11 1.04720 3.050e+11 6.100e+11 >> Flops/sec: 3.824e+08 1.04720 3.737e+08 7.475e+08 >> MPI Messages: 2.958e+03 1.00068 2.958e+03 5.915e+03 >> MPI Message Lengths: 9.598e+08 1.00034 3.245e+05 1.919e+09 >> MPI Reductions: 4.483e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 8.1603e+02 100.0% 6.0997e+11 100.0% 5.915e+03 >> 100.0% 3.245e+05 100.0% 4.467e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 >> 0.0e+00 41 47 50 50 0 41 47 50 50 0 846 >> MatMultTranspose 1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 >> 0.0e+00 42 47 50 50 0 42 47 50 50 0 846 >> MatAssemblyBegin 1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecDot 2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 >> 3.0e+03 2 1 0 0 66 2 1 0 0 66 340 >> VecNorm 1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 >> 1.5e+03 1 1 0 0 33 1 1 0 0 33 287 >> VecCopy 4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 566 >> VecAYPX 2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 510 >> VecAssemblyBegin 6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 194 >> VecScatterBegin 2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 >> 0.0e+00 0 0100100 0 0 0100100 0 0 >> VecScatterEnd 2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> KSPSetup 1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 >> 4.4e+03 92100100100 99 92100100100 99 811 >> PCSetUp 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 193 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 339744648 0 >> Vec 18 18 62239872 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 974736 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 1.21593e-06 >> Average time for MPI_Barrier(): 1.44005e-05 >> Average time for zero size MPI_Send(): 1.94311e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 >> Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 >> 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux >> Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized >> Using PETSc arch: linux-gnu-c-opt >> ----------------------------------------- >> Using C compiler: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall >> -Wwrite-strings -Wno-strict-aliasing -O >> Using Fortran compiler: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall >> -Wno-unused-variable -O >> ----------------------------------------- >> Using include paths: >> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include >> -I/sun42/cheny/petsc-3.1-p5-optimized/include >> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include >> ------------------------------------------ >> Using C linker: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall >> -Wwrite-strings -Wno-strict-aliasing -O >> Using Fortran linker: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall >> -Wno-unused-variable -O >> Using libraries: >> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib >> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc >> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib >> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx >> -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord >> -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt >> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib >> -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 >> -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib >> -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t >> -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib >> -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 >> -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich >> -lpthread -lrt -lgcc_s -ldl >> ------------------------------------------ >> >> >> ---------------------- >> (2) k=4 >> ---------------------- >> Process 0 of total 4 on wmss04 >> Process 2 of total 4 on wmss04 >> Process 3 of total 4 on wmss04 >> Process 1 of total 4 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 17:33:24 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.28342e-06 >> Norm of error 1.28342e-06, Iterations 1473 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 450.583 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 17:40:55 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny >> Mon Dec 20 18:40:55 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 4.807e+02 1.00000 4.807e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11 >> Flops/sec: 3.241e+08 1.06872 3.168e+08 1.267e+09 >> MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04 >> MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09 >> MPI Reductions: 4.477e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 4.8066e+02 100.0% 6.0914e+11 100.0% 1.772e+04 >> 100.0% 2.658e+05 100.0% 4.461e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 >> 0.0e+00 39 47 50 50 0 39 47 50 50 0 1494 >> MatMultTranspose 1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 >> 0.0e+00 40 47 50 50 0 40 47 50 50 0 1498 >> MatAssemblyBegin 1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecDot 2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 >> 2.9e+03 3 1 0 0 66 3 1 0 0 66 274 >> VecNorm 1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 1 1 0 0 33 1 1 0 0 33 310 >> VecCopy 4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 2 0 0 0 3 2 0 0 0 732 >> VecAYPX 2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 610 >> VecAssemblyBegin 6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 202 >> VecScatterBegin 2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 >> 0.0e+00 0 0100100 0 0 0100100 0 0 >> VecScatterEnd 2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 >> KSPSetup 1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 >> 4.4e+03 91100100100 99 91100100100 99 1386 >> PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 201 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 169902696 0 >> Vec 18 18 31282096 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 638616 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 1.5974e-06 >> Average time for MPI_Barrier(): 3.48091e-05 >> Average time for zero size MPI_Send(): 1.8537e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> >> ---------------------- >> (3) k=8 >> ---------------------- >> Process 0 of total 8 on wmss04 >> Process 4 of total 8 on wmss04 >> Process 2 of total 8 on wmss04 >> Process 6 of total 8 on wmss04 >> Process 3 of total 8 on wmss04 >> Process 7 of total 8 on wmss04 >> Process 1 of total 8 on wmss04 >> Process 5 of total 8 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 18:14:59 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.32502e-06 >> Norm of error 1.32502e-06, Iterations 1473 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 311.937 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 18:20:11 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny >> Mon Dec 20 19:20:11 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 3.330e+02 1.00000 3.330e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 >> Flops/sec: 2.340e+08 1.09702 2.286e+08 1.829e+09 >> MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 >> MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 >> MPI Reductions: 4.477e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 3.3302e+02 100.0% 6.0914e+11 100.0% 4.135e+04 >> 100.0% 2.430e+05 100.0% 4.461e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 >> 0.0e+00 38 47 50 50 0 38 47 50 50 0 2031 >> MatMultTranspose 1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 >> 0.0e+00 38 47 50 50 0 38 47 50 50 0 2120 >> MatAssemblyBegin 1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecDot 2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 >> 2.9e+03 6 1 0 0 66 6 1 0 0 66 194 >> VecNorm 1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 1 1 0 0 33 1 1 0 0 33 428 >> VecCopy 4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 1127 >> VecAYPX 2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 1015 >> VecAssemblyBegin 6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 359 >> VecScatterBegin 2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> VecScatterEnd 2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 >> KSPSetup 1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 >> 4.4e+03 90100100100 99 90100100100 99 2024 >> PCSetUp 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 358 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 84944064 0 >> Vec 18 18 15741712 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 409008 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 3.38554e-06 >> Average time for MPI_Barrier(): 7.40051e-05 >> Average time for zero size MPI_Send(): 1.88947e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> >> ---------------------- >> (4) k=12 >> ---------------------- >> Process 1 of total 12 on wmss04 >> Process 5 of total 12 on wmss04 >> Process 2 of total 12 on wmss04 >> Process 9 of total 12 on wmss04 >> Process 6 of total 12 on wmss04 >> Process 7 of total 12 on wmss04 >> Process 10 of total 12 on wmss04 >> Process 3 of total 12 on wmss04 >> Process 11 of total 12 on wmss04 >> Process 4 of total 12 on wmss04 >> Process 8 of total 12 on wmss04 >> Process 0 of total 12 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly.End Assembly. >> End Assembly. >> End Assembly. >> >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 17:56:36 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.28414e-06 >> Norm of error 1.28414e-06, Iterations 1473 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 291.503 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 18:01:28 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny >> Mon Dec 20 19:01:28 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 3.089e+02 1.00012 3.089e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 >> Flops/sec: 1.683e+08 1.11689 1.643e+08 1.971e+09 >> MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 >> MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 >> MPI Reductions: 4.477e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 3.0887e+02 100.0% 6.0890e+11 100.0% 6.498e+04 >> 100.0% 2.345e+05 100.0% 4.461e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 >> 0.0e+00 35 47 50 50 0 35 47 50 50 0 2054 >> MatMultTranspose 1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 >> 0.0e+00 34 47 50 50 0 34 47 50 50 0 2175 >> MatAssemblyBegin 1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecDot 2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 >> 2.9e+03 13 1 0 0 66 13 1 0 0 66 60 >> VecNorm 1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 2 1 0 0 33 2 1 0 0 33 322 >> VecCopy 4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 964 >> VecAYPX 2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 1041 >> VecAssemblyBegin 6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 395 >> VecScatterBegin 2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> VecScatterEnd 2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 >> KSPSetup 1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 >> 4.4e+03 91100100100 99 91100100100 99 2173 >> PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 393 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 56593044 0 >> Vec 18 18 10534536 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 305424 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 6.48499e-06 >> Average time for MPI_Barrier(): 0.000102377 >> Average time for zero size MPI_Send(): 2.15967e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> ---------------------- >> (5) k=16 >> ---------------------- >> Process 0 of total 16 on wmss04 >> Process 8 of total 16 on wmss04 >> Process 4 of total 16 on wmss04 >> Process 12 of total 16 on wmss04 >> Process 2 of total 16 on wmss04 >> Process 6 of total 16 on wmss04 >> Process 5 of total 16 on wmss04 >> Process 11 of total 16 on wmss04 >> Process 14 of total 16 on wmss04 >> Process 7 of total 16 on wmss04 >> Process Process 15 of total 16 on wmss04 >> 3Process 13 of total 16 on wmss04 >> Process 10 of total 16 on wmss04 >> Process 9 of total 16 on wmss04 >> Process 1 of total 16 on wmss04 >> The dimension of Matrix A is n = 1177754 >> of total 16 on wmss04 >> >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly.End Assembly. >> End Assembly.End Assembly.End Assembly.End Assembly. >> End Assembly. >> End Assembly. >> End Assembly.End Assembly. >> >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly.End Assembly. >> >> >> >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 18:02:28 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.15892e-06 >> Norm of error 1.15892e-06, Iterations 1497 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 337.91 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 18:08:06 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny >> Mon Dec 20 19:08:06 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 3.534e+02 1.00001 3.534e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 3.964e+10 1.13060 3.864e+10 6.182e+11 >> Flops/sec: 1.122e+08 1.13060 1.093e+08 1.749e+09 >> MPI Messages: 1.200e+04 3.99917 7.127e+03 1.140e+05 >> MPI Message Lengths: 1.950e+09 7.80999 1.819e+05 2.074e+10 >> MPI Reductions: 4.549e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 3.5342e+02 100.0% 6.1820e+11 100.0% 1.140e+05 >> 100.0% 1.819e+05 100.0% 4.533e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 >> 0.0e+00 40 47 50 50 0 40 47 50 50 0 1555 >> MatMultTranspose 1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 >> 0.0e+00 35 47 50 50 0 35 47 50 50 0 2069 >> MatAssemblyBegin 1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecDot 2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 >> 3.0e+03 10 1 0 0 66 10 1 0 0 66 104 >> VecNorm 1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 2 1 0 0 33 2 1 0 0 33 263 >> VecCopy 4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 931 >> VecAYPX 2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 1 0 0 0 1 1 0 0 0 962 >> VecAssemblyBegin 6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 360 >> VecScatterBegin 2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> VecScatterEnd 2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 22 0 0 0 0 22 0 0 0 0 0 >> KSPSetup 1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 >> 4.5e+03 92100100100 99 92100100100 99 1893 >> PCSetUp 1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 359 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 42424600 0 >> Vec 18 18 7924896 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 247632 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 6.10352e-06 >> Average time for MPI_Barrier(): 0.000129986 >> Average time for zero size MPI_Send(): 2.08169e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> >> >> On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley wrote: >> >>> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen wrote: >>> >>>> >>>> Hi everyone, >>>> >>>> >>>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix >>>> A and right hand vector b are read from files. The dimension of A is >>>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been >>>> read correctly. >>>> >>>> I compiled the program with optimized version (--with-debugging=0), >>>> tested the speed up performance on two servers, and I have found that the >>>> performance is very poor. >>>> >>>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total >>>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 >>>> cores. >>>> >>>> On each of them, with the increasing of computing cores k from 1 to 8 >>>> (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up >>>> will increase from 1 to 6, but when the computing cores k increase from 9 to >>>> 16(for the first server) or 48 (for the second server), the speed up >>>> decrease firstly and then remains a constant value 5.0 (for the first >>>> server) or 4.5(for the second server). >>>> >>> >>> We cannot say anything at all without -log_summary data for your runs. >>> >>> Matt >>> >>> >>>> Actually, the program LAMMPS speed up excellently on these two >>>> servers. >>>> >>>> Any comments are very appreciated! Thanks! >>>> >>>> >>>> >>>> >>>> -------------------------------------------------------------------------------------------------------------------------- >>>> >>>> PS: the related codes are as following, >>>> >>>> >>>> //firstly read A and b from files >>>> >>>> ... >>>> >>>> //then >>>> >>>> >>>> >>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>> CHKERRQ(ierr); >>>> >>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>> CHKERRQ(ierr); >>>> >>>> ierr = VecAssemblyBegin(b); CHKERRQ(ierr); >>>> >>>> ierr = VecAssemblyEnd(b); CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); >>>> CHKERRQ(ierr); >>>> >>>> ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); >>>> >>>> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = >>>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); >>>> >>>> ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); >>>> >>>> ierr = >>>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); >>>> >>>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = >>>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = VecAssemblyBegin(x);CHKERRQ(ierr); >>>> >>>> ierr = VecAssemblyEnd(x);CHKERRQ(ierr); >>>> >>>> ... >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> >> >> -- >> Dr.Yongjun Chen >> Room 2507, Building M >> Institute of Materials Science and Technology >> Technical University of Hamburg-Harburg >> Ei?endorfer Stra?e 42, 21073 Hamburg, Germany. >> Tel: +49 (0)40-42878-4386 >> Fax: +49 (0)40-42878-4070 >> E-mail: yjxd.chen at gmail.com >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Dec 20 16:04:37 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 20 Dec 2010 16:04:37 -0600 (CST) Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: On Mon, 20 Dec 2010, Yongjun Chen wrote: > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and > see what I can get. hydra is just the process manager. Also --download-mpich uses a slightly older version - with device=ch3:sock for portability and valgrind reasons [development] You might want to install latest mpich manually with the defaut device=ch3:nemsis and recheck.. satish From yjxd.chen at gmail.com Mon Dec 20 16:12:01 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Mon, 20 Dec 2010 23:12:01 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: Satish, many thanks for your advice! On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay wrote: > On Mon, 20 Dec 2010, Yongjun Chen wrote: > > > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly > and > > see what I can get. > > hydra is just the process manager. > > Also --download-mpich uses a slightly older version - with > device=ch3:sock for portability and valgrind reasons [development] > > You might want to install latest mpich manually with the defaut > device=ch3:nemsis and recheck.. > > satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.witkowski at tu-dresden.de Tue Dec 21 07:49:58 2010 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Tue, 21 Dec 2010 14:49:58 +0100 Subject: [petsc-users] ParMETIS question Message-ID: <4D10B086.2090503@tu-dresden.de> Hi, I have a not directly PETSc related question, but I hope to get some answer from the community here. In my FEM code, I make use of ParMETIS to partition the mesh. I make direct use of this library and not of PETSc's ParMETIS integration. The initial partition is always fine, but I use the ParMETIS_V3_AdaptiveRepart function for repartition the mesh due to local mesh adaption. In most cases, the result is fine, but there are two points, where I have trouble with: 1) Sometimes ParMETIS generates empty partitions, i.e., a processor has zero mesh elements. This is something my code cannot handle. Is this a bug or a feature? If it is a feature, is there any possiblity to disable it? 2) In most cases the specific partitions are not connected. If I put all data to ParMETIS in a correct way, is this okay? My code can handle it, but is slows down the computation due to larger interior boundaries and therefore to more communications. Does anyone of you know an answer to these question? Is there a debug mode in ParMETIS, where I can see which data is set to its function calls? Regards, Thomas From knepley at gmail.com Tue Dec 21 10:31:01 2010 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Dec 2010 08:31:01 -0800 Subject: [petsc-users] ParMETIS question In-Reply-To: <4D10B086.2090503@tu-dresden.de> References: <4D10B086.2090503@tu-dresden.de> Message-ID: On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski < thomas.witkowski at tu-dresden.de> wrote: > Hi, > > I have a not directly PETSc related question, but I hope to get some answer > from the community here. In my FEM code, I make use of ParMETIS to partition > the mesh. I make direct use of this library and not of PETSc's ParMETIS > integration. The initial partition is always fine, but I use the > ParMETIS_V3_AdaptiveRepart function for repartition the mesh due to local > mesh adaption. In most cases, the result is fine, but there are two points, > where I have trouble with: > > 1) Sometimes ParMETIS generates empty partitions, i.e., a processor has > zero mesh elements. This is something my code cannot handle. Is this a bug > or a feature? If it is a feature, is there any possiblity to disable it? > ParMetis has a balance constraint if you weight vertices. This will enforce equal size partitions. > 2) In most cases the specific partitions are not connected. If I put all > data to ParMETIS in a correct way, is this okay? My code can handle it, but > is slows down the computation due to larger interior boundaries and > therefore to more communications. > ParMetis minimizes the overall boundary size, so I do not understand how you could see this slowdown. Matt > Does anyone of you know an answer to these question? Is there a debug mode > in ParMETIS, where I can see which data is set to its function calls? > > Regards, > > Thomas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Tue Dec 21 12:53:52 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 21 Dec 2010 12:53:52 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. Message-ID: Hi all, I am running a linear problem discretized with FEM on a diffusion reaction system, with discontinuous source distribution. When I run FGMRes with geometric multigrid as its preconditioner, I notice that every time after the restart in fgmres, the new residual is orders of magnitude higher than the previous iteration. I might be wrong on this but should the restart not preserve monotonicity in convergence ? Or am I thinking of a different variant of Gmres here. Here's the residual norm as a function of iteration with number of restarts=50. 40 KSP Residual norm 2.489810374358e-06 41 KSP Residual norm 1.585813670005e-06 42 KSP Residual norm 1.059211836025e-06 43 KSP Residual norm 6.701461059247e-07 44 KSP Residual norm 4.127634824940e-07 45 KSP Residual norm 2.511364148934e-07 46 KSP Residual norm 1.307034672896e-07 47 KSP Residual norm 7.105770015635e-08 48 KSP Residual norm 4.098578230710e-08 49 KSP Residual norm 2.426160176080e-08 ------------------------------------------------------------------------- 50 KSP Residual norm 1.864914790828e+02 51 KSP Residual norm 6.741080961009e+01 52 KSP Residual norm 5.191621875736e+01 53 KSP Residual norm 4.513782866249e+01 54 KSP Residual norm 3.320195603375e+01 55 KSP Residual norm 2.699941296855e+01 56 KSP Residual norm 1.707998091297e+01 57 KSP Residual norm 1.219599670348e+01 Any suggestions to obtain a more smoother convergence would be much appreciated. Thank you, Vijay From jed at 59A2.org Tue Dec 21 13:10:38 2010 From: jed at 59A2.org (Jed Brown) Date: Tue, 21 Dec 2010 20:10:38 +0100 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: On Tue, Dec 21, 2010 at 19:53, Vijay S. Mahadevan wrote: > I am running a linear problem discretized with FEM on a diffusion > reaction system, with discontinuous source distribution. When I run > FGMRes with geometric multigrid as its preconditioner, I notice that > every time after the restart in fgmres, the new residual is orders of > magnitude higher than the previous iteration. I might be wrong on this > but should the restart not preserve monotonicity in convergence ? Or > am I thinking of a different variant of Gmres here. > It is not possible to guarantee monotonicity for nonsymmetric matrices without storing the full subspace. There is no variant of GMRES, or any Krylov method for that matter, that can do what you want. You are seeing a particularly large jump, if you actually have a linear preconditioner (if you don't use Krylov cycles inside your smoothers) then you might try using bcgs or some variant thereof which would avoid the high cost of restart. Or you could stop using restarts, it looks like you were getting close to an adequate tolerance. Or find a way to make the preconditioner strong enough to converge in a reasonable number of iterations. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Dec 21 14:04:07 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 21 Dec 2010 14:04:07 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: This is a sign that the preconditioner is seriously messed up and should not be used in its current form. It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. Barry On Dec 21, 2010, at 12:53 PM, Vijay S. Mahadevan wrote: > Hi all, > > I am running a linear problem discretized with FEM on a diffusion > reaction system, with discontinuous source distribution. When I run > FGMRes with geometric multigrid as its preconditioner, I notice that > every time after the restart in fgmres, the new residual is orders of > magnitude higher than the previous iteration. I might be wrong on this > but should the restart not preserve monotonicity in convergence ? Or > am I thinking of a different variant of Gmres here. Here's the > residual norm as a function of iteration with number of restarts=50. > > 40 KSP Residual norm 2.489810374358e-06 > 41 KSP Residual norm 1.585813670005e-06 > 42 KSP Residual norm 1.059211836025e-06 > 43 KSP Residual norm 6.701461059247e-07 > 44 KSP Residual norm 4.127634824940e-07 > 45 KSP Residual norm 2.511364148934e-07 > 46 KSP Residual norm 1.307034672896e-07 > 47 KSP Residual norm 7.105770015635e-08 > 48 KSP Residual norm 4.098578230710e-08 > 49 KSP Residual norm 2.426160176080e-08 > ------------------------------------------------------------------------- > 50 KSP Residual norm 1.864914790828e+02 > 51 KSP Residual norm 6.741080961009e+01 > 52 KSP Residual norm 5.191621875736e+01 > 53 KSP Residual norm 4.513782866249e+01 > 54 KSP Residual norm 3.320195603375e+01 > 55 KSP Residual norm 2.699941296855e+01 > 56 KSP Residual norm 1.707998091297e+01 > 57 KSP Residual norm 1.219599670348e+01 > > Any suggestions to obtain a more smoother convergence would be much > appreciated. Thank you, > > Vijay From jed at 59A2.org Tue Dec 21 14:08:01 2010 From: jed at 59A2.org (Jed Brown) Date: Tue, 21 Dec 2010 21:08:01 +0100 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: > This is a sign that the preconditioner is seriously messed up and should > not be used in its current form. It can happen if the matrix is nearly > singular and for example you use an incomplete factorization for a > preconditioner that just screws up the scaling like totally. Run with > -ksp_monitor_true_residual and you'll see that the solver is not really > solving the problem even though it thinks it is converging fine. FGMRES only does right preconditioning so it should be showing the true residual. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Tue Dec 21 14:16:25 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 21 Dec 2010 14:16:25 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: Jed, I ask because after the restart, the residual changes 10 orders of magnitude and a-priori, it is quite hard to decide the restart number. Yes in the test case I presented, the residual gets close enough to the tolerance and I can afford few more vector storage but for a much refined problem, this might not be the case and so it worries me. My initial tests with bcgs were not satisfactory (very bad convergence as compared to gmres) but I tried GCR just now and it seems to converge correctly to the right solution, monotonically for the same problem. Alternatively, yes, I could make my preconditioner stronger (add more levels, more smoothing steps etc..) to converge within the restart limit. Barry, the matrix is not nearly singular although I have not yet looked at the effectiveness of the preconditoner thoroughly yet. It is possible that the preconditioned operator might have some undesired properties. Just to compare, the same linear system without any preconditioning takes about 2400 iterations and maybe that gives some ball park metric on the efficiency of the preconditioner.. Let me know if you want to know some other specific information to better understand the system. Vijay On Tue, Dec 21, 2010 at 1:10 PM, Jed Brown wrote: > On Tue, Dec 21, 2010 at 19:53, Vijay S. Mahadevan wrote: >> >> I am running a linear problem discretized with FEM on a diffusion >> reaction system, with discontinuous source distribution. When I run >> FGMRes with geometric multigrid as its preconditioner, I notice that >> every time after the restart in fgmres, the new residual is orders of >> magnitude higher than the previous iteration. I might be wrong on this >> but should the restart not preserve monotonicity in convergence ? Or >> am I thinking of a different variant of Gmres here. > > It is not possible to guarantee monotonicity for nonsymmetric matrices > without storing the full subspace. ?There is no variant of GMRES, or any > Krylov method for that matter, that can do what you want. ?You are seeing a > particularly large jump, if you actually have a linear preconditioner (if > you don't use Krylov cycles inside your smoothers) then you might try using > bcgs or some variant thereof which would avoid the high cost of restart. ?Or > you could stop using restarts, it looks like you were getting close to an > adequate tolerance. ?Or find a way to make the preconditioner strong enough > to converge in a reasonable number of iterations. From bsmith at mcs.anl.gov Tue Dec 21 14:23:16 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 21 Dec 2010 14:23:16 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: > On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: > This is a sign that the preconditioner is seriously messed up and should not be used in its current form. It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. > > FGMRES only does right preconditioning so it should be showing the true residual. No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". Barry From vijay.m at gmail.com Tue Dec 21 14:26:40 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 21 Dec 2010 14:26:40 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> Message-ID: Barry, I tried with the true_residual_norm option and it gives me the exact same convergence as the one I have shown before. 45 KSP Residual norm 2.511364148934e-07 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 46 KSP Residual norm 1.307034672896e-07 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 47 KSP Residual norm 7.105770015635e-08 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 48 KSP Residual norm 4.098578230710e-08 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 49 KSP Residual norm 2.426160176080e-08 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 50 KSP Residual norm 1.864914790828e+02 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 51 KSP Residual norm 6.741080961009e+01 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 52 KSP Residual norm 5.191621875736e+01 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 53 KSP Residual norm 4.513782866249e+01 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01 Vijay On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith wrote: > > On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: > >> On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. ?It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. >> >> FGMRES only does right preconditioning so it should be showing the true residual. > > ?No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". > > ? Barry > > > From jed at 59A2.org Tue Dec 21 14:28:14 2010 From: jed at 59A2.org (Jed Brown) Date: Tue, 21 Dec 2010 21:28:14 +0100 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: On Tue, Dec 21, 2010 at 21:16, Vijay S. Mahadevan wrote: > Jed, I ask because after the restart, the residual changes 10 orders > of magnitude and a-priori, it is quite hard to decide the restart > number. Yes in the test case I presented, the residual gets close > enough to the tolerance and I can afford few more vector storage but > for a much refined problem, this might not be the case and so it > worries me. > What happens if you run with -ksp_gmres_modifiedgramschmidt? This is slow in parallel, but provides insight into what is causing the problem. My initial tests with bcgs were not satisfactory (very bad convergence > as compared to gmres) but I tried GCR just now and it seems to > converge correctly to the right solution, monotonically for the same > problem. > GCR provides a cheap way to access the solution, see what it does with monitor_true_residual. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Dec 21 14:28:34 2010 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 21 Dec 2010 12:28:34 -0800 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: Message-ID: Vijay, You should definitely follow Barry's suggestion and monitor the true residual (using -ksp_monitor_true_residual). Jed, I know this may sound odd given FGMRES uses right preconditioning, but I've seen that when the system is badly scaled, the preconditioned residual and the true residual reported by -ksp_monitor_true_residual can drift from one another. In such situations with the reported numbers are initially the same, but after a number of iterations, the preconditioned residual may continue to decrease but the true residual actually stagnates. Cheers, Dave On 21 December 2010 12:08, Jed Brown wrote: > On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >> >> This is a sign that the preconditioner is seriously messed up and should >> not be used in its current form. ?It can happen if the matrix is nearly >> singular and for example you use an incomplete factorization for a >> preconditioner that just screws up the scaling like totally. Run with >> -ksp_monitor_true_residual and you'll see that the solver is not really >> solving the problem even though it thinks it is converging fine. > > FGMRES only does right preconditioning so it should be showing the true > residual. From jed at 59A2.org Tue Dec 21 14:29:31 2010 From: jed at 59A2.org (Jed Brown) Date: Tue, 21 Dec 2010 21:29:31 +0100 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> Message-ID: On Tue, Dec 21, 2010 at 21:26, Vijay S. Mahadevan wrote: > 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm > 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 > The true residual is huge, this is not converging. You probably have a singular preconditioner. What are you using (-ksp_view) and what system are you solving with what discretization. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Dec 21 14:30:25 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 21 Dec 2010 14:30:25 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> Message-ID: <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. Barry On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote: > Barry, I tried with the true_residual_norm option and it gives me the > exact same convergence as the one I have shown before. > > 45 KSP Residual norm 2.511364148934e-07 > 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm > 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 > 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm > 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 > 46 KSP Residual norm 1.307034672896e-07 > 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm > 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 > 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm > 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 > 47 KSP Residual norm 7.105770015635e-08 > 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm > 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 > 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm > 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 > 48 KSP Residual norm 4.098578230710e-08 > 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm > 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 > 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm > 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 > 49 KSP Residual norm 2.426160176080e-08 > 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm > 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 > 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm > 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 > 50 KSP Residual norm 1.864914790828e+02 > 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm > 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 > 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm > 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 > 51 KSP Residual norm 6.741080961009e+01 > 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm > 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 > 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm > 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 > 52 KSP Residual norm 5.191621875736e+01 > 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm > 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 > 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm > 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 > 53 KSP Residual norm 4.513782866249e+01 > 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm > 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 > 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm > 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01 > > > Vijay > > On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith wrote: >> >> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: >> >>> On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. >>> >>> FGMRES only does right preconditioning so it should be showing the true residual. >> >> No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". >> >> Barry >> >> >> From vijay.m at gmail.com Tue Dec 21 14:46:20 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 21 Dec 2010 14:46:20 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov> References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov> Message-ID: > Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. Ah yes. I was reading the output wrongly. Thanks for pointing that out. So then it is quite possible that my preconditioner is terrible for this problem. Curiously with GCR, the true residual does converge. 62 KSP Residual norm 6.845396874593e-10 62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 63 KSP Residual norm 4.617426258215e-10 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 64 KSP Residual norm 3.659090331422e-10 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 65 KSP Residual norm 2.457005532004e-10 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 66 KSP Residual norm 1.765446010945e-10 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 Jed, with modified gram schmidt procedure, fgmres yields the following, which looks like the same as before: 49 KSP Residual norm 2.426160176080e-08 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 50 KSP Residual norm 1.864914790828e+02 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 51 KSP Residual norm 6.741080961009e+01 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 But I generally see that the true residual of GCR seems to converge to desired tolerance but for GMRES, the convergence stagnates with different options on my MG preconditioner. This is puzzling to me since I spent enough time making sure that the preconditioner was working correctly but I will look more into this now. Thanks for all the helpful comments guys ! I will post here if I find any other curious behavior. Vijay On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith wrote: > > ?Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. > > ?Barry > > On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote: > >> Barry, I tried with the true_residual_norm option and it gives me the >> exact same convergence as the one I have shown before. >> >> 45 KSP Residual norm 2.511364148934e-07 >> ? 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm >> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 >> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >> 46 KSP Residual norm 1.307034672896e-07 >> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >> 47 KSP Residual norm 7.105770015635e-08 >> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >> 48 KSP Residual norm 4.098578230710e-08 >> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >> 49 KSP Residual norm 2.426160176080e-08 >> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >> 50 KSP Residual norm 1.864914790828e+02 >> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >> 51 KSP Residual norm 6.741080961009e+01 >> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >> 52 KSP Residual norm 5.191621875736e+01 >> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >> 53 KSP Residual norm 4.513782866249e+01 >> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >> ? 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm >> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01 >> >> >> Vijay >> >> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith wrote: >>> >>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: >>> >>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. ?It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. >>>> >>>> FGMRES only does right preconditioning so it should be showing the true residual. >>> >>> ?No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". >>> >>> ? Barry >>> >>> >>> > > From bsmith at mcs.anl.gov Tue Dec 21 14:52:20 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 21 Dec 2010 14:52:20 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov> Message-ID: The GCR algorithm computes the residual and hence the residual norm EXPLICITLY as part of the solution process, it does not use the recursive formula that FGMES uses. I "think" the use of the recursive formula is why FGMRES is cheaper than GCR (and hence much more commonly used). Barry On Dec 21, 2010, at 2:46 PM, Vijay S. Mahadevan wrote: >> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. > Ah yes. I was reading the output wrongly. Thanks for pointing that > out. So then it is quite possible that my preconditioner is terrible > for this problem. > > Curiously with GCR, the true residual does converge. > > 62 KSP Residual norm 6.845396874593e-10 > 62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm > 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00 > 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm > 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 > 63 KSP Residual norm 4.617426258215e-10 > 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm > 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 > 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm > 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 > 64 KSP Residual norm 3.659090331422e-10 > 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm > 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 > 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm > 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 > 65 KSP Residual norm 2.457005532004e-10 > 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm > 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 > 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm > 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 > 66 KSP Residual norm 1.765446010945e-10 > 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm > 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 > > Jed, with modified gram schmidt procedure, fgmres yields the > following, which looks like the same as before: > > 49 KSP Residual norm 2.426160176080e-08 > 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm > 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 > 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm > 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 > 50 KSP Residual norm 1.864914790828e+02 > 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm > 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 > 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm > 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 > 51 KSP Residual norm 6.741080961009e+01 > 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm > 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 > 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm > 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 > > But I generally see that the true residual of GCR seems to converge to > desired tolerance but for GMRES, the convergence stagnates with > different options on my MG preconditioner. This is puzzling to me > since I spent enough time making sure that the preconditioner was > working correctly but I will look more into this now. Thanks for all > the helpful comments guys ! I will post here if I find any other > curious behavior. > > Vijay > > On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith wrote: >> >> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. >> >> Barry >> >> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote: >> >>> Barry, I tried with the true_residual_norm option and it gives me the >>> exact same convergence as the one I have shown before. >>> >>> 45 KSP Residual norm 2.511364148934e-07 >>> 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm >>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 >>> 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >>> 46 KSP Residual norm 1.307034672896e-07 >>> 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >>> 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >>> 47 KSP Residual norm 7.105770015635e-08 >>> 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >>> 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >>> 48 KSP Residual norm 4.098578230710e-08 >>> 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >>> 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>> 49 KSP Residual norm 2.426160176080e-08 >>> 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>> 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>> 50 KSP Residual norm 1.864914790828e+02 >>> 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>> 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>> 51 KSP Residual norm 6.741080961009e+01 >>> 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>> 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>> 52 KSP Residual norm 5.191621875736e+01 >>> 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>> 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >>> 53 KSP Residual norm 4.513782866249e+01 >>> 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >>> 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm >>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01 >>> >>> >>> Vijay >>> >>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith wrote: >>>> >>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: >>>> >>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. >>>>> >>>>> FGMRES only does right preconditioning so it should be showing the true residual. >>>> >>>> No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". >>>> >>>> Barry >>>> >>>> >>>> >> >> From vijay.m at gmail.com Tue Dec 21 15:16:37 2010 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Tue, 21 Dec 2010 15:16:37 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov> Message-ID: Also GCR seems to use and allocate comparatively more vectors, translating to lot more memory. This does make FGMRES more attractive. I will look at the preconditioner and try to find the true cause of the issue. Cheers, Vijay On Tue, Dec 21, 2010 at 2:52 PM, Barry Smith wrote: > > ?The GCR algorithm computes the residual and hence the residual norm EXPLICITLY as part of the solution process, it does not use the recursive formula that FGMES uses. I "think" the use of the recursive formula is why FGMRES is cheaper than GCR (and hence much more commonly used). > > ? Barry > > > On Dec 21, 2010, at 2:46 PM, Vijay S. Mahadevan wrote: > >>> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. >> Ah yes. I was reading the output wrongly. Thanks for pointing that >> out. So then it is quite possible that my preconditioner is terrible >> for this problem. >> >> Curiously with GCR, the true residual does converge. >> >> 62 KSP Residual norm 6.845396874593e-10 >> ? 62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm >> 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00 >> ? 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm >> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 >> 63 KSP Residual norm 4.617426258215e-10 >> ? 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm >> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 >> ? 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm >> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 >> 64 KSP Residual norm 3.659090331422e-10 >> ? 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm >> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 >> ? 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm >> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 >> 65 KSP Residual norm 2.457005532004e-10 >> ? 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm >> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 >> ? 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm >> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 >> 66 KSP Residual norm 1.765446010945e-10 >> ? 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm >> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 >> >> Jed, with modified gram schmidt procedure, fgmres yields the >> following, which looks like the same as before: >> >> 49 KSP Residual norm 2.426160176080e-08 >> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >> 50 KSP Residual norm 1.864914790828e+02 >> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >> 51 KSP Residual norm 6.741080961009e+01 >> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >> >> But I generally see that the true residual of GCR seems to converge to >> desired tolerance but for GMRES, the convergence stagnates with >> different options on my MG preconditioner. This is puzzling to me >> since I spent enough time making sure that the preconditioner was >> working correctly but I will look more into this now. Thanks for all >> the helpful comments guys ! I will post here if I find any other >> curious behavior. >> >> Vijay >> >> On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith wrote: >>> >>> ?Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. >>> >>> ?Barry >>> >>> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote: >>> >>>> Barry, I tried with the true_residual_norm option and it gives me the >>>> exact same convergence as the one I have shown before. >>>> >>>> 45 KSP Residual norm 2.511364148934e-07 >>>> ? 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm >>>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 >>>> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >>>> 46 KSP Residual norm 1.307034672896e-07 >>>> ? 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >>>> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >>>> 47 KSP Residual norm 7.105770015635e-08 >>>> ? 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >>>> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >>>> 48 KSP Residual norm 4.098578230710e-08 >>>> ? 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >>>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>>> 49 KSP Residual norm 2.426160176080e-08 >>>> ? 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>>> 50 KSP Residual norm 1.864914790828e+02 >>>> ? 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>>> 51 KSP Residual norm 6.741080961009e+01 >>>> ? 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>>> 52 KSP Residual norm 5.191621875736e+01 >>>> ? 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>>> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >>>> 53 KSP Residual norm 4.513782866249e+01 >>>> ? 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >>>> ? 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm >>>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01 >>>> >>>> >>>> Vijay >>>> >>>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith wrote: >>>>> >>>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: >>>>> >>>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >>>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. ?It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. >>>>>> >>>>>> FGMRES only does right preconditioning so it should be showing the true residual. >>>>> >>>>> ?No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". >>>>> >>>>> ? Barry >>>>> >>>>> >>>>> >>> >>> > > From bsmith at mcs.anl.gov Tue Dec 21 15:34:30 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 21 Dec 2010 15:34:30 -0600 Subject: [petsc-users] Monotonic convergence in FGMRES. In-Reply-To: References: <749AF8F4-5E7F-47CA-9534-EFBC069FC35C@mcs.anl.gov> <54344075-96E2-43E3-B273-C82251B5B735@mcs.anl.gov> Message-ID: <9ABA7B64-06BA-4025-9D57-FA25161929A3@mcs.anl.gov> On Dec 21, 2010, at 3:16 PM, Vijay S. Mahadevan wrote: > Also GCR seems to use and allocate comparatively more vectors, > translating to lot more memory. Yes > This does make FGMRES more attractive. > I will look at the preconditioner and try to find the true cause of > the issue. > > Cheers, > Vijay > > On Tue, Dec 21, 2010 at 2:52 PM, Barry Smith wrote: >> >> The GCR algorithm computes the residual and hence the residual norm EXPLICITLY as part of the solution process, it does not use the recursive formula that FGMES uses. I "think" the use of the recursive formula is why FGMRES is cheaper than GCR (and hence much more commonly used). >> >> Barry >> >> >> On Dec 21, 2010, at 2:46 PM, Vijay S. Mahadevan wrote: >> >>>> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. >>> Ah yes. I was reading the output wrongly. Thanks for pointing that >>> out. So then it is quite possible that my preconditioner is terrible >>> for this problem. >>> >>> Curiously with GCR, the true residual does converge. >>> >>> 62 KSP Residual norm 6.845396874593e-10 >>> 62 KSP preconditioned resid norm 6.845396874593e-10 true resid norm >>> 6.845396874593e-10 ||Ae||/||Ax|| 1.063128003731e+00 >>> 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm >>> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 >>> 63 KSP Residual norm 4.617426258215e-10 >>> 63 KSP preconditioned resid norm 4.617426258215e-10 true resid norm >>> 4.617426258215e-10 ||Ae||/||Ax|| 9.425403350509e-01 >>> 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm >>> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 >>> 64 KSP Residual norm 3.659090331422e-10 >>> 64 KSP preconditioned resid norm 3.659090331422e-10 true resid norm >>> 3.659090331422e-10 ||Ae||/||Ax|| 1.044433624917e+00 >>> 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm >>> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 >>> 65 KSP Residual norm 2.457005532004e-10 >>> 65 KSP preconditioned resid norm 2.457005532004e-10 true resid norm >>> 2.457005532004e-10 ||Ae||/||Ax|| 9.250757590415e-01 >>> 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm >>> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 >>> 66 KSP Residual norm 1.765446010945e-10 >>> 66 KSP preconditioned resid norm 1.765446010945e-10 true resid norm >>> 1.765446010945e-10 ||Ae||/||Ax|| 9.880804659179e-01 >>> >>> Jed, with modified gram schmidt procedure, fgmres yields the >>> following, which looks like the same as before: >>> >>> 49 KSP Residual norm 2.426160176080e-08 >>> 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>> 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>> 50 KSP Residual norm 1.864914790828e+02 >>> 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>> 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>> 51 KSP Residual norm 6.741080961009e+01 >>> 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>> 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>> >>> But I generally see that the true residual of GCR seems to converge to >>> desired tolerance but for GMRES, the convergence stagnates with >>> different options on my MG preconditioner. This is puzzling to me >>> since I spent enough time making sure that the preconditioner was >>> working correctly but I will look more into this now. Thanks for all >>> the helpful comments guys ! I will post here if I find any other >>> curious behavior. >>> >>> Vijay >>> >>> On Tue, Dec 21, 2010 at 2:30 PM, Barry Smith wrote: >>>> >>>> Yes but look at the true residual norm it is huge and indicates the residual is not really getting small. >>>> >>>> Barry >>>> >>>> On Dec 21, 2010, at 2:26 PM, Vijay S. Mahadevan wrote: >>>> >>>>> Barry, I tried with the true_residual_norm option and it gives me the >>>>> exact same convergence as the one I have shown before. >>>>> >>>>> 45 KSP Residual norm 2.511364148934e-07 >>>>> 45 KSP preconditioned resid norm 2.511364148934e-07 true resid norm >>>>> 1.865039278877e+02 ||Ae||/||Ax|| 2.699481989705e+02 >>>>> 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >>>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >>>>> 46 KSP Residual norm 1.307034672896e-07 >>>>> 46 KSP preconditioned resid norm 1.307034672896e-07 true resid norm >>>>> 1.864478183180e+02 ||Ae||/||Ax|| 2.724877015479e+02 >>>>> 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >>>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >>>>> 47 KSP Residual norm 7.105770015635e-08 >>>>> 47 KSP preconditioned resid norm 7.105770015635e-08 true resid norm >>>>> 1.864563163311e+02 ||Ae||/||Ax|| 2.722662760395e+02 >>>>> 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >>>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >>>>> 48 KSP Residual norm 4.098578230710e-08 >>>>> 48 KSP preconditioned resid norm 4.098578230710e-08 true resid norm >>>>> 1.864560351328e+02 ||Ae||/||Ax|| 2.690284539995e+02 >>>>> 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>>>> 49 KSP Residual norm 2.426160176080e-08 >>>>> 49 KSP preconditioned resid norm 2.426160176080e-08 true resid norm >>>>> 1.864897210364e+02 ||Ae||/||Ax|| 2.696456942624e+02 >>>>> 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>>>> 50 KSP Residual norm 1.864914790828e+02 >>>>> 50 KSP preconditioned resid norm 1.864914790828e+02 true resid norm >>>>> 1.864914790828e+02 ||Ae||/||Ax|| 2.798875072987e+02 >>>>> 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>>>> 51 KSP Residual norm 6.741080961009e+01 >>>>> 51 KSP preconditioned resid norm 6.741080961009e+01 true resid norm >>>>> 6.759768469363e+01 ||Ae||/||Ax|| 1.666964983874e+02 >>>>> 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>>>> 52 KSP Residual norm 5.191621875736e+01 >>>>> 52 KSP preconditioned resid norm 5.191621875736e+01 true resid norm >>>>> 5.146342142561e+01 ||Ae||/||Ax|| 7.225409161988e+01 >>>>> 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >>>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >>>>> 53 KSP Residual norm 4.513782866249e+01 >>>>> 53 KSP preconditioned resid norm 4.513782866249e+01 true resid norm >>>>> 4.546883708687e+01 ||Ae||/||Ax|| 7.426476446334e+01 >>>>> 54 KSP preconditioned resid norm 3.320195603375e+01 true resid norm >>>>> 3.297361634749e+01 ||Ae||/||Ax|| 5.285029509147e+01 >>>>> >>>>> >>>>> Vijay >>>>> >>>>> On Tue, Dec 21, 2010 at 2:23 PM, Barry Smith wrote: >>>>>> >>>>>> On Dec 21, 2010, at 2:08 PM, Jed Brown wrote: >>>>>> >>>>>>> On Tue, Dec 21, 2010 at 21:04, Barry Smith wrote: >>>>>>> This is a sign that the preconditioner is seriously messed up and should not be used in its current form. It can happen if the matrix is nearly singular and for example you use an incomplete factorization for a preconditioner that just screws up the scaling like totally. Run with -ksp_monitor_true_residual and you'll see that the solver is not really solving the problem even though it thinks it is converging fine. >>>>>>> >>>>>>> FGMRES only does right preconditioning so it should be showing the true residual. >>>>>> >>>>>> No because it uses a recursive formula for "computing" the residual norm it does not compute it explicitly. So in extreme circumstances the recursively compute one generates "garbage". >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From gaurish108 at gmail.com Tue Dec 21 22:03:30 2010 From: gaurish108 at gmail.com (Gaurish Telang) Date: Tue, 21 Dec 2010 23:03:30 -0500 Subject: [petsc-users] Reading matrices into PETSc Message-ID: i have this large text file containing a matrix. This text file contains the non-zero entries of a very large sparse matrix The first two columns indicate the position of the non-zero entry and the last column the actual non-zero value it self for example the matrix 1 0 8 0 0 5 6 0 0 is written in the text file in the form of 1 1 1 1 3 8 2 3 5 3 1 6 This is the standard [(row ,column), non-zero] entry format. i want PETSc to load this matrix from the text file i am not sure how to do that. What commands do I use? I am new to PETSc, so some detail in the explanation will be really helpful. Sincere thanks, Gaurish. -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.tabak at tudelft.nl Tue Dec 21 22:09:22 2010 From: u.tabak at tudelft.nl (Umut Tabak) Date: Wed, 22 Dec 2010 05:09:22 +0100 Subject: [petsc-users] Reading matrices into PETSc In-Reply-To: References: Message-ID: <4D1179F2.7020306@tudelft.nl> Gaurish Telang wrote: > i have this large text file containing a matrix. > This text file contains the non-zero entries of a very large sparse > matrix > Read the documentation on MatSetValues function. It is a starting point. From bsmith at mcs.anl.gov Tue Dec 21 22:16:47 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 21 Dec 2010 22:16:47 -0600 Subject: [petsc-users] Reading matrices into PETSc In-Reply-To: References: Message-ID: <0B40812B-4A1A-4021-A5D0-1EA21D2F66AC@mcs.anl.gov> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#sparse-matrix-ascii-format Barry On Dec 21, 2010, at 10:03 PM, Gaurish Telang wrote: > i have this large text file containing a matrix. > > This text file contains the non-zero entries of a very large sparse matrix > > > The first two columns indicate the position of the non-zero entry > > and the last column the actual non-zero value it self > > > for > example the matrix > > 1 0 8 > > 0 0 5 > > 6 0 0 > > is written in the text file in the form of > > 1 1 1 > > 1 3 8 > > 2 3 5 > > 3 1 6 > > This is the standard [(row ,column), non-zero] entry format. > i want PETSc to load this matrix > > from the text file > > i am not sure how > > to do that. What commands do I use? > > I am new to PETSc, so some detail in the explanation will be really helpful. > > Sincere thanks, > > Gaurish. From gaurish108 at gmail.com Wed Dec 22 01:56:33 2010 From: gaurish108 at gmail.com (Gaurish Telang) Date: Wed, 22 Dec 2010 02:56:33 -0500 Subject: [petsc-users] Reading matrices into PETSc In-Reply-To: References: Message-ID: I am sorry, but I am not yet clear on how to do this. I read ex32.c and ex72.c but I am still confused. What is ASCII 'slap' format? How should matrix be supplied to PETSc? My matrix is a 2000x1900 matrix given in the format MATLAB stores sparse matrices. i.e [row, column, non-zero-entry] format. On Tue, Dec 21, 2010 at 11:03 PM, Gaurish Telang wrote: > i have this large text file containing a matrix. > This text file contains the non-zero entries of a very large sparse > matrix > > The first two columns indicate the position of the non-zero entry > and the last column the actual non-zero value it self > > for > example the matrix > 1 0 8 > 0 0 5 > 6 0 0 > is written in the text file in the form of > 1 1 1 > 1 3 8 > 2 3 5 > 3 1 6 > > This is the standard [(row ,column), non-zero] entry format. i want > PETSc to load this matrix > from the text file > i am not sure how > to do that. What commands do I use? > > I am new to PETSc, so some detail in the explanation will be really > helpful. > > Sincere thanks, > > Gaurish. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhyshr at mcs.anl.gov Wed Dec 22 03:41:21 2010 From: abhyshr at mcs.anl.gov (Shri) Date: Wed, 22 Dec 2010 03:41:21 -0600 (CST) Subject: [petsc-users] Reading matrices into PETSc In-Reply-To: Message-ID: <1372925860.24678.1293010881196.JavaMail.root@zimbra.anl.gov> Find attached a routine which reads matrix data from an ASCII file in i j value format and creates a seqaij matrix. ----- Original Message ----- I am sorry, but I am not yet clear on how to do this. I read ex32.c and ex72.c but I am still confused. What is ASCII 'slap' format? How should matrix be supplied to PETSc? My matrix is a 2000x1900 matrix given in the format MATLAB stores sparse matrices. i.e [row, column, non-zero-entry] format. On Tue, Dec 21, 2010 at 11:03 PM, Gaurish Telang < gaurish108 at gmail.com > wrote: i have this large text file containing a matrix. This text file contains the non-zero entries of a very large sparse matrix The first two columns indicate the position of the non-zero entry and the last column the actual non-zero value it self for example the matrix 1 0 8 0 0 5 6 0 0 is written in the text file in the form of 1 1 1 1 3 8 2 3 5 3 1 6 This is the standard [(row ,column), non-zero] entry format. i want PETSc to load this matrix from the text file i am not sure how to do that. What commands do I use? I am new to PETSc, so some detail in the explanation will be really helpful. Sincere thanks, Gaurish. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ReadMatFromFile.c Type: text/x-csrc Size: 2125 bytes Desc: not available URL: From thomas.witkowski at tu-dresden.de Wed Dec 22 03:07:22 2010 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Wed, 22 Dec 2010 10:07:22 +0100 Subject: [petsc-users] ParMETIS question In-Reply-To: References: <4D10B086.2090503@tu-dresden.de> Message-ID: <4D11BFCA.2030800@tu-dresden.de> Okay, in my computations, I have empty partitions on some ranks and definitely not minimal boundary sizes. So may be I generate a wrong input. But if this is the case, I wonder why the resulting mesh partitioning is quite good. If I neglect the problem of empty partitions, the redistributed mesh leads to a very good load balancing. Is there any meaningful way to debug the problem? Is there something link a "verbose mode" in ParMetis that says me whats happen on the input data? Otherwise I have to print all the input data to the screen and check it by hand. Although I have a quite small example with 128 overall coarse mesh elements on 8 ranks, this is not big fun :) Thomas @Matthew: By mistake I've answered your mail directly to you and not to the mailing list, therefore I sent it now here again Matthew Knepley wrote: > On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski > > wrote: > > Hi, > > I have a not directly PETSc related question, but I hope to get > some answer from the community here. In my FEM code, I make use of > ParMETIS to partition the mesh. I make direct use of this library > and not of PETSc's ParMETIS integration. The initial partition is > always fine, but I use the ParMETIS_V3_AdaptiveRepart function for > repartition the mesh due to local mesh adaption. In most cases, > the result is fine, but there are two points, where I have trouble > with: > > 1) Sometimes ParMETIS generates empty partitions, i.e., a > processor has zero mesh elements. This is something my code cannot > handle. Is this a bug or a feature? If it is a feature, is there > any possiblity to disable it? > > > ParMetis has a balance constraint if you weight vertices. This will > enforce equal size partitions. > > > 2) In most cases the specific partitions are not connected. If I > put all data to ParMETIS in a correct way, is this okay? My code > can handle it, but is slows down the computation due to larger > interior boundaries and therefore to more communications. > > > ParMetis minimizes the overall boundary size, so I do not understand > how you could see this slowdown. > > Matt > > > Does anyone of you know an answer to these question? Is there a > debug mode in ParMETIS, where I can see which data is set to its > function calls? > > Regards, > > Thomas > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From thomas.witkowski at tu-dresden.de Wed Dec 22 03:12:34 2010 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Wed, 22 Dec 2010 10:12:34 +0100 Subject: [petsc-users] ParMETIS question In-Reply-To: References: <4D10B086.2090503@tu-dresden.de> <20101221205356.xt2c3cnyockkck88@mail.zih.tu-dresden.de> Message-ID: <4D11C102.8020605@tu-dresden.de> Matthew Knepley wrote: > On Tue, Dec 21, 2010 at 11:53 AM, Thomas Witkowski > > wrote: > > Okay, in my computations, I have empty partitions on some ranks > and definitely not minimal boundary sizes. So may be I generate a > wrong input. But if this is the case, I wonder why the resulting > mesh partitioning is quite good. If I neglect the problem of empty > > > The above statement does not make any sense. You can get perfect load > balancing by just chopping the mesh into > equal parts. You only care about using a mesh partitioner if you want > to minimize the cut size (boundary length, communication, > etc.) My situation is slightly different. I make the partitioning and distribution of the mesh on a coarse level. The coarser elements may be further adapted. For ParMetis, I use the number of fine elements in the coarse mesh elements to weight them. Therefore, I do not get equal parts with respect to the number of coarse mesh elements. But what do not want to have are empty partitions. And in my test case with 128 coarse mesh elements and 8 processes, I get using either ParMETIS_V3_PartMeshKway or ParMETIS_V3_AdaptiveRepart two empty partitions. I wrote a function to print the dual graph of the mesh, and it looks fine. Thomas > Matt > > > partitions, the redistributed mesh leads to a very good load > balancing. Is there any meaningful way to debug the problem? Is > there something link a "verbose mode" in ParMetis that says me > whats happen on the input data? Otherwise I have to print all the > input data to the screen and check it by hand. Although I have a > quite small example with 128 overall coarse mesh elements on 8 > ranks, this is not big fun :) > > Thomas > > Zitat von Matthew Knepley >: > > > On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski < > thomas.witkowski at tu-dresden.de > > wrote: > > Hi, > > I have a not directly PETSc related question, but I hope > to get some answer > from the community here. In my FEM code, I make use of > ParMETIS to partition > the mesh. I make direct use of this library and not of > PETSc's ParMETIS > integration. The initial partition is always fine, but I > use the > ParMETIS_V3_AdaptiveRepart function for repartition the > mesh due to local > mesh adaption. In most cases, the result is fine, but > there are two points, > where I have trouble with: > > 1) Sometimes ParMETIS generates empty partitions, i.e., a > processor has > zero mesh elements. This is something my code cannot > handle. Is this a bug > or a feature? If it is a feature, is there any possiblity > to disable it? > > > ParMetis has a balance constraint if you weight vertices. This > will enforce > equal size partitions. > > > 2) In most cases the specific partitions are not > connected. If I put all > data to ParMETIS in a correct way, is this okay? My code > can handle it, but > is slows down the computation due to larger interior > boundaries and > therefore to more communications. > > > ParMetis minimizes the overall boundary size, so I do not > understand how you > could see this slowdown. > > Matt > > > Does anyone of you know an answer to these question? Is > there a debug mode > in ParMETIS, where I can see which data is set to its > function calls? > > Regards, > > Thomas > > > > > -- > What most experimenters take for granted before they begin > their experiments > is infinitely more interesting than any results to which their > experiments > lead. > -- Norbert Wiener > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From u.tabak at tudelft.nl Wed Dec 22 03:32:57 2010 From: u.tabak at tudelft.nl (Umut Tabak) Date: Wed, 22 Dec 2010 10:32:57 +0100 Subject: [petsc-users] Reading matrices into PETSc In-Reply-To: References: Message-ID: <4D11C5C9.1050805@tudelft.nl> On 12/22/2010 08:56 AM, Gaurish Telang wrote: > but I am not yet clear on how to do this. I read ex32.c and ex72.c but > I am still confused. What is ASCII 'slap' format? How should matrix > be supplied to PETSc? > > My matrix is a 2000x1900 matrix given in the format MATLAB stores > sparse matrices. i.e What exactly do you want to do, here is a simple example for Compressed Row Storage (CSR) format which is the default format in PETSc as far as I know. You can use the MatSetValues function to set the nonzeros, Have you tried to read the documentation about the matrices in the user manual? Matlab pseudo code A = diag([1 1 1]) So table is sth like this i j val 0 0 1 1 1 1 2 2 1 see this Mat A; // of course, A should be created first and assembled after the set operation int m =3; int n = 3; PetscInt indxm [] ={0, 1, 2}; PetscInt indxn [] = {0, 1, 2}; PetscScalar vals[] = {1.,1.,1.} MatSetValues(A, m, idxm, n, idxn, vals, INSERT_VALUES); // assemble after HTH, Umut -- - Hope is a good thing, maybe the best of things and no good thing ever dies... The Shawshank Redemption, replique of Tim Robbins From thomas.witkowski at tu-dresden.de Wed Dec 22 08:01:13 2010 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Wed, 22 Dec 2010 15:01:13 +0100 Subject: [petsc-users] ParMETIS question In-Reply-To: <4D11BFCA.2030800@tu-dresden.de> References: <4D10B086.2090503@tu-dresden.de> <4D11BFCA.2030800@tu-dresden.de> Message-ID: <4D1204A9.7090209@tu-dresden.de> So, I found the problem related to empty partitions. It is not possible to weight vertices (i.e. elements of a mesh) in such a way that one weight is much higher than the other ones. For more details see http://glaros.dtc.umn.edu/flyspray/task/11 Its a pity that ParMetis makes is very hard to find this kind of errors. The open question for me is about the non continuous partitions. Is it a normal behavior of ParMetis to create partitions that are not continous? Thomas Thomas Witkowski wrote: > Okay, in my computations, I have empty partitions on some ranks and > definitely not > minimal boundary sizes. So may be I generate a wrong input. But if > this is the case, I > wonder why the resulting mesh partitioning is quite good. If I neglect > the problem of > empty partitions, the redistributed mesh leads to a very good load > balancing. Is there > any meaningful way to debug the problem? Is there something link a > "verbose mode" in > ParMetis that says me whats happen on the input data? Otherwise I have > to print all the > input data to the screen and check it by hand. Although I have a quite > small example with > 128 overall coarse mesh elements on 8 ranks, this is not big fun :) > > Thomas > > @Matthew: By mistake I've answered your mail directly to you and not > to the mailing list, therefore I sent it now here again > > Matthew Knepley wrote: >> On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski >> > > wrote: >> >> Hi, >> >> I have a not directly PETSc related question, but I hope to get >> some answer from the community here. In my FEM code, I make use of >> ParMETIS to partition the mesh. I make direct use of this library >> and not of PETSc's ParMETIS integration. The initial partition is >> always fine, but I use the ParMETIS_V3_AdaptiveRepart function for >> repartition the mesh due to local mesh adaption. In most cases, >> the result is fine, but there are two points, where I have trouble >> with: >> >> 1) Sometimes ParMETIS generates empty partitions, i.e., a >> processor has zero mesh elements. This is something my code cannot >> handle. Is this a bug or a feature? If it is a feature, is there >> any possiblity to disable it? >> >> >> ParMetis has a balance constraint if you weight vertices. This will >> enforce equal size partitions. >> >> >> 2) In most cases the specific partitions are not connected. If I >> put all data to ParMETIS in a correct way, is this okay? My code >> can handle it, but is slows down the computation due to larger >> interior boundaries and therefore to more communications. >> >> >> ParMetis minimizes the overall boundary size, so I do not understand >> how you could see this slowdown. >> >> Matt >> >> >> Does anyone of you know an answer to these question? Is there a >> debug mode in ParMETIS, where I can see which data is set to its >> function calls? >> >> Regards, >> >> Thomas >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener > > > From bsmith at mcs.anl.gov Wed Dec 22 08:09:33 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 22 Dec 2010 08:09:33 -0600 Subject: [petsc-users] Reading matrices into PETSc In-Reply-To: References: Message-ID: On Dec 22, 2010, at 1:56 AM, Gaurish Telang wrote: > I am sorry, but I am not yet clear on how to do this. I read ex32.c and ex72.c but I am still confused. What is ASCII 'slap' format? How should matrix be supplied to PETSc? You will need to modify one of those examples slightly to read in the exact format of the ASCII file that you have. The names of the various formats doesn't matter, you just need to match the reading in of the ASCII file to the exact format of your file. Barry > > My matrix is a 2000x1900 matrix given in the format MATLAB stores sparse matrices. i.e [row, column, non-zero-entry] format. > > > > On Tue, Dec 21, 2010 at 11:03 PM, Gaurish Telang wrote: > i have this large text file containing a matrix. > > This text file contains the non-zero entries of a very large sparse matrix > > > The first two columns indicate the position of the non-zero entry > > and the last column the actual non-zero value it self > > > for > example the matrix > > 1 0 8 > > 0 0 5 > > 6 0 0 > > is written in the text file in the form of > > 1 1 1 > > 1 3 8 > > 2 3 5 > > 3 1 6 > > This is the standard [(row ,column), non-zero] entry format. > i want PETSc to load this matrix > > from the text file > > i am not sure how > > to do that. What commands do I use? > > I am new to PETSc, so some detail in the explanation will be really helpful. > > Sincere thanks, > > Gaurish. > From yjxd.chen at gmail.com Wed Dec 22 09:55:23 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Wed, 22 Dec 2010 16:55:23 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: Satish, I have reconfigured the PETSC with ?download-mpich=1 and ?with-device=ch3:sock. The results show that the speed up can now remain increasing when computing cores increase from 1 to 16. However, the maximum speed up is still only around 6.0 with 16 cores. The new log files can be found in the attachment. (1) I checked the configuration of the first server again. This server is a shared-memory computer, with Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. It seems that each core can get 2.7GB/s memory bandwidth which can fulfill the basic requirement for sparse iterative solvers. Is this correct? Does the shared-memory type of computer have no benefit for PETSC when the memory bandwidth is limited? (2) Beside, we would like to continue our work by employing a matrix partitioning / reordering algorithm, such as Metis or ParMetis, to improve the speed up performance of the program. (The current program works without any matrix decomposition.) Matt, as you said in http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html ,?Reordering a matrix can result in fewer iterations for an iterative solver?. Do you think the matrix partitioning/reordering will work for this program? Or any further suggestions? Any comments are very welcome! Thank you! On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay wrote: > On Mon, 20 Dec 2010, Yongjun Chen wrote: > > > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly > and > > see what I can get. > > hydra is just the process manager. > > Also --download-mpich uses a slightly older version - with > device=ch3:sock for portability and valgrind reasons [development] > > You might want to install latest mpich manually with the defaut > device=ch3:nemsis and recheck.. > > satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Process 0 of total 4 on wmss04 Process 2 of total 4 on wmss04 Process 1 of total 4 on wmss04 Process 3 of total 4 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 11:41:09 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28342e-06 Norm of error 1.28342e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 420.527 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 11:48:09 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny Wed Dec 22 12:48:09 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 4.531e+02 1.00000 4.531e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11 Flops/sec: 3.438e+08 1.06872 3.361e+08 1.344e+09 MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04 MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.5314e+02 100.0% 6.0914e+11 100.0% 1.772e+04 100.0% 2.658e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.7876e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 1617 MatMultTranspose 1473 1.0 1.7886e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 1615 MatAssemblyBegin 1 1.0 3.2670e-0312.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 6.1171e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 1.6379e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0934e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecDot 2946 1.0 1.9010e+01 2.2 1.73e+09 1.0 0.0e+00 0.0e+00 2.9e+03 3 1 0 0 66 3 1 0 0 66 365 VecNorm 1475 1.0 1.0313e+01 2.8 8.69e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 337 VecCopy 4 1.0 5.2447e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.8803e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 1.3866e+01 1.5 2.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 751 VecAYPX 2944 1.0 1.0440e+01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 664 VecAssemblyBegin 6 1.0 1.0071e-0161.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 2.4080e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 1.6040e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 216 VecScatterBegin 2947 1.0 1.7367e+00 2.2 0.00e+00 0.0 1.8e+04 2.7e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2947 1.0 3.0331e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 KSPSetup 1 1.0 1.3974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 4.0934e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 4.4e+03 90100100100 99 90100100100 99 1488 PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 1.6080e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 216 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 169902696 0 Vec 18 18 31282096 0 Vec Scatter 2 2 1736 0 Index Set 4 4 638616 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.19209e-06 Average time for MPI_Barrier(): 5.97954e-05 Average time for zero size MPI_Send(): 2.07424e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 11:56:02 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3sock ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 8 on wmss04 Process 4 of total 8 on wmss04 Process 6 of total 8 on wmss04 Process 2 of total 8 on wmss04 Process 1 of total 8 on wmss04 Process 5 of total 8 on wmss04 Process 3 of total 8 on wmss04 Process 7 of total 8 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 11:12:03 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.32502e-06 Norm of error 1.32502e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 291.989 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 11:16:55 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 12:16:55 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.113e+02 1.00000 3.113e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 Flops/sec: 2.503e+08 1.09702 2.446e+08 1.957e+09 MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.1128e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.2879e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2244 MatMultTranspose 1473 1.0 1.2240e+02 1.3 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 2360 MatAssemblyBegin 1 1.0 3.1061e-03 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.0727e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.2912e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1926e+0113.1 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 6.5343e+0113.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 9 1 0 0 66 9 1 0 0 66 106 VecNorm 1475 1.0 6.9889e+00 3.6 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 497 VecCopy 4 1.0 5.1496e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.2587e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 8.7103e+00 1.5 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1195 VecAYPX 2944 1.0 5.7803e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1200 VecAssemblyBegin 6 1.0 3.9916e-0214.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 3.6001e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 8.6749e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 400 VecScatterBegin 2947 1.0 1.9621e+00 2.7 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2947 1.0 5.9072e+0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 8.9231e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.7991e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 90100100100 99 90100100100 99 2175 PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 8.7041e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 399 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 84944064 0 Vec 18 18 15741712 0 Vec Scatter 2 2 1736 0 Index Set 4 4 409008 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 4.3869e-06 Average time for MPI_Barrier(): 7.25746e-05 Average time for zero size MPI_Send(): 2.06232e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 11:56:02 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3sock ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 12 on wmss04 Process 2 of total 12 on wmss04 Process 6 of total 12 on wmss04 Process 4 of total 12 on wmss04 Process 8 of total 12 on wmss04 Process 11 of total 12 on wmss04 Process 1Process 3 of total 12 on wmss04 of total 12 on wmss04 Process 5 of total 12 on wmss04 The dimension of Matrix A is n = 1177754 Process 9 of total 12 on wmss04 Process 10 of total 12 on wmss04 Process 7 of total 12 on wmss04 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 12:13:43 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28414e-06 Norm of error 1.28414e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 253.909 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 12:17:57 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 13:17:57 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 2.721e+02 1.00000 2.721e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 Flops/sec: 1.910e+08 1.11689 1.865e+08 2.238e+09 MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.7212e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.2467e+02 1.6 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 2318 MatMultTranspose 1473 1.0 1.0645e+02 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2712 MatAssemblyBegin 1 1.0 4.0723e-0274.7 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.3137e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.8801e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.2262e+0190.2 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 6.1395e+0111.5 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 9 1 0 0 66 9 1 0 0 66 113 VecNorm 1475 1.0 5.8101e+00 3.3 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 598 VecCopy 4 1.0 5.6744e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.1137e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 6.6266e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1571 VecAYPX 2944 1.0 5.2210e+00 2.3 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1328 VecAssemblyBegin 6 1.0 5.0129e-0218.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 4.7922e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 7.0911e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 490 VecScatterBegin 2947 1.0 2.5096e+00 3.1 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 4.4540e+01 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 7.9119e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.4149e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 89100100100 99 89100100100 99 2521 PCSetUp 1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 7.1207e+00 1.6 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 488 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 56593044 0 Vec 18 18 10534536 0 Vec Scatter 2 2 1736 0 Index Set 4 4 305424 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 6.00815e-06 Average time for MPI_Barrier(): 0.000122833 Average time for zero size MPI_Send(): 2.81533e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 11:56:02 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3sock ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 3 of total 16 on wmss04 Process 7 of total 16 on wmss04 Process 1 of total 16 on wmss04 Process 15 of total 16 on wmss04 Process 5 of total 16 on wmss04 Process 13 of total 16 on wmss04 Process 11 of total 16 on wmss04 Process 9 of total 16 on wmss04 Process 0 of total 16 on wmss04 Process 10 of total 16 on wmss04 Process 4 of total 16 on wmss04 Process 12 of total 16 on wmss04 Process 2 of total 16 on wmss04 Process 6 of total 16 on wmss04 Process 14 of total 16 on wmss04 Process 8 of total 16 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly.End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 11:23:54 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.194e-06 Norm of error 1.194e-06, Iterations 1495 ========================================================= The solver has finished successfully! ========================================================= The solving time is 240.208 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 11:27:54 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 12:27:54 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 2.565e+02 1.00001 2.565e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.959e+10 1.13060 3.859e+10 6.174e+11 Flops/sec: 1.543e+08 1.13060 1.504e+08 2.407e+09 MPI Messages: 1.198e+04 3.99917 7.118e+03 1.139e+05 MPI Message Lengths: 1.948e+09 7.80981 1.819e+05 2.071e+10 MPI Reductions: 4.543e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.5651e+02 100.0% 6.1737e+11 100.0% 1.139e+05 100.0% 1.819e+05 100.0% 4.527e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1496 1.0 1.1625e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 38 47 50 50 0 38 47 50 50 0 2520 MatMultTranspose 1495 1.0 9.7790e+01 1.2 1.88e+10 1.1 5.7e+04 1.8e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 2994 MatAssemblyBegin 1 1.0 6.3910e-0314.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.2797e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 3.0708e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.1235e+01111.3 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2990 1.0 5.7054e+0114.6 4.40e+08 1.0 0.0e+00 0.0e+00 3.0e+03 9 1 0 0 66 9 1 0 0 66 123 VecNorm 1497 1.0 5.8130e+00 3.5 2.20e+08 1.0 0.0e+00 0.0e+00 1.5e+03 2 1 0 0 33 2 1 0 0 33 607 VecCopy 4 1.0 3.3658e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8975 1.0 2.5879e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4486 1.0 7.5991e+00 1.6 6.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1391 VecAYPX 2988 1.0 4.6226e+00 1.6 4.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1523 VecAssemblyBegin 6 1.0 3.9858e-0213.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 6.6996e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2992 1.0 7.0992e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 496 VecScatterBegin 2991 1.0 3.3736e+00 3.7 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2991 1.0 3.3633e+01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 5.6469e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.2884e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 4.5e+03 89100100100 99 89100100100 99 2697 PCSetUp 1 1.0 5.0068e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2992 1.0 7.1263e+00 1.5 2.20e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 494 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 42424600 0 Vec 18 18 7924896 0 Vec Scatter 2 2 1736 0 Index Set 4 4 247632 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 8.91685e-06 Average time for MPI_Barrier(): 0.000128984 Average time for zero size MPI_Send(): 1.8239e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 11:56:02 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:sock --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 11:56:30 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3sock ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3sock/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ From bsmith at mcs.anl.gov Wed Dec 22 10:40:49 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 22 Dec 2010 10:40:49 -0600 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> On Dec 22, 2010, at 9:55 AM, Yongjun Chen wrote: > > Satish, > > I have reconfigured the PETSC with ?download-mpich=1 and ?with-device=ch3:sock. The results show that the speed up can now remain increasing when computing cores increase from 1 to 16. However, the maximum speed up is still only around 6.0 with 16 cores. The new log files can be found in the attachment. > > > (1) > > I checked the configuration of the first server again. This server is a shared-memory computer, with > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not enough for iterative solvers, in fact this is absolutely terrible for iterative solvers. You really want 5.4 GB/s PER core! This machine is absolutely inappropriate for iterative solvers. No package can give you good speedups on this machine. Barry > > It seems that each core can get 2.7GB/s memory bandwidth which can fulfill the basic requirement for sparse iterative solvers. > > Is this correct? Does the shared-memory type of computer have no benefit for PETSC when the memory bandwidth is limited? > > > (2) > > Beside, we would like to continue our work by employing a matrix partitioning / reordering algorithm, such as Metis or ParMetis, to improve the speed up performance of the program. (The current program works without any matrix decomposition.) > > > Matt, as you said in http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html ,?Reordering a matrix can result in fewer iterations for an iterative solver?. > > Do you think the matrix partitioning/reordering will work for this program? Or any further suggestions? > > > Any comments are very welcome! Thank you! > > > > > > > > > On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay wrote: > On Mon, 20 Dec 2010, Yongjun Chen wrote: > > > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and > > see what I can get. > > hydra is just the process manager. > > Also --download-mpich uses a slightly older version - with > device=ch3:sock for portability and valgrind reasons [development] > > You might want to install latest mpich manually with the defaut > device=ch3:nemsis and recheck.. > > satish > > > > From yjxd.chen at gmail.com Wed Dec 22 10:46:26 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Wed, 22 Dec 2010 17:46:26 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith wrote: > > On Dec 22, 2010, at 9:55 AM, Yongjun Chen wrote: > > > > > Satish, > > > > I have reconfigured the PETSC with ?download-mpich=1 and > ?with-device=ch3:sock. The results show that the speed up can now remain > increasing when computing cores increase from 1 to 16. However, the maximum > speed up is still only around 6.0 with 16 cores. The new log files can be > found in the attachment. > > > > > > (1) > > > > I checked the configuration of the first server again. This server is a > shared-memory computer, with > > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not > enough for iterative solvers, in fact this is absolutely terrible for > iterative solvers. You really want 5.4 GB/s PER core! This machine is > absolutely inappropriate for iterative solvers. No package can give you good > speedups on this machine. > > Barry > Barry, there are 16 memories, every 2 memories make up one dual channel, thus in this machine there are 8 dual channel, each dual channel has the memory bandwidth 5.4GB/s. Yongjun > > > > > It seems that each core can get 2.7GB/s memory bandwidth which can > fulfill the basic requirement for sparse iterative solvers. > > > > Is this correct? Does the shared-memory type of computer have no benefit > for PETSC when the memory bandwidth is limited? > > > > > > (2) > > > > Beside, we would like to continue our work by employing a matrix > partitioning / reordering algorithm, such as Metis or ParMetis, to improve > the speed up performance of the program. (The current program works without > any matrix decomposition.) > > > > > > Matt, as you said in > http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html,?Reordering a matrix can result in fewer iterations for an iterative > solver?. > > > > Do you think the matrix partitioning/reordering will work for this > program? Or any further suggestions? > > > > > > Any comments are very welcome! Thank you! > > > > > > > > > > > > > > > > > > On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay > wrote: > > On Mon, 20 Dec 2010, Yongjun Chen wrote: > > > > > Matt, Barry, thanks a lot for your reply! I will try mpich hydra > firstly and > > > see what I can get. > > > > hydra is just the process manager. > > > > Also --download-mpich uses a slightly older version - with > > device=ch3:sock for portability and valgrind reasons [development] > > > > You might want to install latest mpich manually with the defaut > > device=ch3:nemsis and recheck.. > > > > satish > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Dec 22 10:48:07 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 22 Dec 2010 10:48:07 -0600 (CST) Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: On Wed, 22 Dec 2010, Yongjun Chen wrote: > Satish, > > I have reconfigured the PETSC with ?download-mpich=1 and > ?with-device=ch3:sock. The results show that the speed up can now remain > increasing when computing cores increase from 1 to 16. However, the maximum > speed up is still only around 6.0 with 16 cores. The new log files can be > found in the attachment. Perhaps this was mentioned eariler. Performance doesn't scale with number of cores. [its depends on both scalable compute unites -aka cores, as well as scalable memory modules] When the hardware is not designed to provide scalable performance - expecting it is wrong. The goal should be to extract max performance out of a given piece of hardware - not scalable performance. Wrt with-device=ch3:sock - it might not be the best performer for shared memory Try the default 'device=ch3:nemsis' Satish From balay at mcs.anl.gov Wed Dec 22 10:54:53 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 22 Dec 2010 10:54:53 -0600 (CST) Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, 22 Dec 2010, Yongjun Chen wrote: > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith wrote: > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. > > > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not > > enough for iterative solvers, in fact this is absolutely terrible for > > iterative solvers. You really want 5.4 GB/s PER core! This machine is > > absolutely inappropriate for iterative solvers. No package can give you good > > speedups on this machine. > > Barry, there are 16 memories, every 2 memories make up one dual channel, > thus in this machine there are 8 dual channel, each dual channel has the > memory bandwidth 5.4GB/s. What hardware is this? [processor/chipset?] >From what you say - it looks like each chip has 4cores, and 2 dual-channel memory controllers for each of them. The question is - does the hardware provide scalable memory-bandwidth per core? Most machines don't. I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core run. So if the algorithm is able to use 5.4GB/s [or more] for 1 threads, 10.8 [or more] for 2 threads - you would just see scalable performance from 1 to 2, and 3, 4 would perhaps be slightly incremental to the 2-core performance. Satish From yjxd.chen at gmail.com Wed Dec 22 11:12:43 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Wed, 22 Dec 2010 18:12:43 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay wrote: > On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith wrote: > > > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > > > > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so > the > > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. > > > > > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not > > > enough for iterative solvers, in fact this is absolutely terrible for > > > iterative solvers. You really want 5.4 GB/s PER core! This machine is > > > absolutely inappropriate for iterative solvers. No package can give you > good > > > speedups on this machine. > > > > Barry, there are 16 memories, every 2 memories make up one dual channel, > > thus in this machine there are 8 dual channel, each dual channel has the > > memory bandwidth 5.4GB/s. > > What hardware is this? [processor/chipset?] > By dmidecode, it shows the processor is Handle 0x0010, DMI type 4, 40 bytes Processor Information Socket Designation: CPU 4 Type: Central Processor Family: Quad-Core Opteron Manufacturer: AMD ID: 06 05 F6 40 74 03 E8 3D Signature: Family 5, Model 0, Stepping 6 Flags: DE (Debugging extension) TSC (Time stamp counter) MSR (Model specific registers) PAE (Physical address extension) CX8 (CMPXCHG8 instruction supported) APIC (On-chip APIC hardware supported) CLFSH (CLFLUSH instruction supported) DS (Debug store) ACPI (ACPI supported) MMX (MMX technology supported) FXSR (Fast floating-point save and restore) SSE2 (Streaming SIMD extensions 2) SS (Self-snoop) HTT (Hyper-threading technology) TM (Thermal monitor supported) Version: Quad-Core AMD Opteron(tm) Processor 8360 SE Voltage: 1.5 V External Clock: 200 MHz Max Speed: 4600 MHz Current Speed: 2500 MHz Status: Populated, Enabled Upgrade: Other L1 Cache Handle: 0x0011 L2 Cache Handle: 0x0012 L3 Cache Handle: 0x0013 Serial Number: N/A Asset Tag: N/A Part Number: N/A Core Count: 4 Core Enabled: 4 Characteristics: 64-bit capable > >From what you say - it looks like each chip has 4cores, and 2 > dual-channel memory controllers for each of them. > > The question is - does the hardware provide scalable memory-bandwidth > per core? Most machines don't. > This point is not clear for me right now. > I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core > run. > > So if the algorithm is able to use 5.4GB/s [or more] for 1 threads, > 10.8 [or more] for 2 threads - you would just see scalable performance > from 1 to 2, and 3, 4 would perhaps be slightly incremental to the > 2-core performance. > > Satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yjxd.chen at gmail.com Wed Dec 22 11:23:57 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Wed, 22 Dec 2010 18:23:57 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: Message-ID: On Wed, Dec 22, 2010 at 5:48 PM, Satish Balay wrote: > On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > Satish, > > > > I have reconfigured the PETSC with ?download-mpich=1 and > > ?with-device=ch3:sock. The results show that the speed up can now remain > > increasing when computing cores increase from 1 to 16. However, the > maximum > > speed up is still only around 6.0 with 16 cores. The new log files can be > > found in the attachment. > > Perhaps this was mentioned eariler. Performance doesn't scale with > number of cores. [its depends on both scalable compute unites -aka > cores, as well as scalable memory modules] > > When the hardware is not designed to provide scalable performance - > expecting it is wrong. The goal should be to extract max performance > out of a given piece of hardware - not scalable performance. > Wrt with-device=ch3:sock - it might not be the best performer for > shared memory Try the default 'device=ch3:nemsis' > > Satish I am now trying with --with-device=ch3:nemsis. Hope it can have a little better performance -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Dec 22 11:32:10 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 22 Dec 2010 11:32:10 -0600 (CST) Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, 22 Dec 2010, Yongjun Chen wrote: > On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay wrote: > > > On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > > > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith wrote: > > > > > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > > > > > > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so > > the > > > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. > > > > > > > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not > > > > enough for iterative solvers, in fact this is absolutely terrible for > > > > iterative solvers. You really want 5.4 GB/s PER core! This machine is > > > > absolutely inappropriate for iterative solvers. No package can give you > > good > > > > speedups on this machine. > > > > > > Barry, there are 16 memories, every 2 memories make up one dual channel, > > > thus in this machine there are 8 dual channel, each dual channel has the > > > memory bandwidth 5.4GB/s. > > > > What hardware is this? [processor/chipset?] > > > > By dmidecode, it shows the processor is > > Handle 0x0010, DMI type 4, 40 bytes > Processor Information > Socket Designation: CPU 4 > Type: Central Processor > Family: Quad-Core Opteron > Manufacturer: AMD > ID: 06 05 F6 40 74 03 E8 3D > Signature: Family 5, Model 0, Stepping 6 > Flags: > DE (Debugging extension) > TSC (Time stamp counter) > MSR (Model specific registers) > PAE (Physical address extension) > CX8 (CMPXCHG8 instruction supported) > APIC (On-chip APIC hardware supported) > CLFSH (CLFLUSH instruction supported) > DS (Debug store) > ACPI (ACPI supported) > MMX (MMX technology supported) > FXSR (Fast floating-point save and restore) > SSE2 (Streaming SIMD extensions 2) > SS (Self-snoop) > HTT (Hyper-threading technology) > TM (Thermal monitor supported) > Version: Quad-Core AMD Opteron(tm) Processor 8360 SE > Voltage: 1.5 V > External Clock: 200 MHz > Max Speed: 4600 MHz > Current Speed: 2500 MHz > Status: Populated, Enabled > Upgrade: Other > L1 Cache Handle: 0x0011 > L2 Cache Handle: 0x0012 > L3 Cache Handle: 0x0013 > Serial Number: N/A > Asset Tag: N/A > Part Number: N/A > Core Count: 4 > Core Enabled: 4 > Characteristics: > 64-bit capable ok - your machine has the following schematic.. [from google] http://www.qdpma.com/SystemArchitecture_files/013_Opteron.png > > >From what you say - it looks like each chip has 4cores, and 2 > > dual-channel memory controllers for each of them. > > > > The question is - does the hardware provide scalable memory-bandwidth > > per core? Most machines don't. > > > > This point is not clear for me right now. Hm.. the point is: the hardware designer had 2 choices: - provide a single memory controller per core [so each core gets only 2.7gb/s - i.e 4 memory controllers per CPU, and common L2 cache across all cores not possible] - provide a single memory controller with 2-dual memory channels [i.e 10.8GB/s] thats shared by 1-4 cores. With this - there can be a single L2 cache for all 4 cores. Which of the above 2 is a good design? The first one provides scalable performance - but the second one doesn't. Also the first one limits the performance of sequential [np=1 applications]. The second one provides all bandwidth to even np=1 codes - so they might have better sequential performane. And then performance differences due to different cache synchronization issues.. Satish > > > > > I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core > > run. > > > > So if the algorithm is able to use 5.4GB/s [or more] for 1 threads, > > 10.8 [or more] for 2 threads - you would just see scalable performance > > from 1 to 2, and 3, 4 would perhaps be slightly incremental to the > > 2-core performance. > > > > Satish > > > From yjxd.chen at gmail.com Wed Dec 22 11:49:40 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Wed, 22 Dec 2010 18:49:40 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay wrote: > On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay wrote: > > > > > On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > > > > > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith > wrote: > > > > > > > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > > > > > > > > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, > so > > > the > > > > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. > > > > > > > > > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is > not > > > > > enough for iterative solvers, in fact this is absolutely terrible > for > > > > > iterative solvers. You really want 5.4 GB/s PER core! This machine > is > > > > > absolutely inappropriate for iterative solvers. No package can give > you > > > good > > > > > speedups on this machine. > > > > > > > > Barry, there are 16 memories, every 2 memories make up one dual > channel, > > > > thus in this machine there are 8 dual channel, each dual channel has > the > > > > memory bandwidth 5.4GB/s. > > > > > > What hardware is this? [processor/chipset?] > > > > > > > By dmidecode, it shows the processor is > > > > Handle 0x0010, DMI type 4, 40 bytes > > Processor Information > > Socket Designation: CPU 4 > > Type: Central Processor > > Family: Quad-Core Opteron > > Manufacturer: AMD > > ID: 06 05 F6 40 74 03 E8 3D > > Signature: Family 5, Model 0, Stepping 6 > > Flags: > > DE (Debugging extension) > > TSC (Time stamp counter) > > MSR (Model specific registers) > > PAE (Physical address extension) > > CX8 (CMPXCHG8 instruction supported) > > APIC (On-chip APIC hardware supported) > > CLFSH (CLFLUSH instruction supported) > > DS (Debug store) > > ACPI (ACPI supported) > > MMX (MMX technology supported) > > FXSR (Fast floating-point save and restore) > > SSE2 (Streaming SIMD extensions 2) > > SS (Self-snoop) > > HTT (Hyper-threading technology) > > TM (Thermal monitor supported) > > Version: Quad-Core AMD Opteron(tm) Processor 8360 SE > > Voltage: 1.5 V > > External Clock: 200 MHz > > Max Speed: 4600 MHz > > Current Speed: 2500 MHz > > Status: Populated, Enabled > > Upgrade: Other > > L1 Cache Handle: 0x0011 > > L2 Cache Handle: 0x0012 > > L3 Cache Handle: 0x0013 > > Serial Number: N/A > > Asset Tag: N/A > > Part Number: N/A > > Core Count: 4 > > Core Enabled: 4 > > Characteristics: > > 64-bit capable > > ok - your machine has the following schematic.. [from google] > > http://www.qdpma.com/SystemArchitecture_files/013_Opteron.png > > > > >From what you say - it looks like each chip has 4cores, and 2 > > > dual-channel memory controllers for each of them. > > > > > > The question is - does the hardware provide scalable memory-bandwidth > > > per core? Most machines don't. > > > > > > > This point is not clear for me right now. > > Hm.. the point is: the hardware designer had 2 choices: > > - provide a single memory controller per core [so each core gets only > 2.7gb/s - i.e 4 memory controllers per CPU, and common L2 cache > across all cores not possible] > > - provide a single memory controller with 2-dual memory channels [i.e > 10.8GB/s] thats shared by 1-4 cores. With this - there can be a > single L2 cache for all 4 cores. > > Which of the above 2 is a good design? The first one provides scalable > performance - but the second one doesn't. Also the first one limits > the performance of sequential [np=1 applications]. The second one > provides all bandwidth to even np=1 codes - so they might have better > sequential performane. And then performance differences due to different > cache synchronization issues.. > > Satish > > Thanks a lot, Satish. It is much clear now. But for the choice of the two, the program dmidecode does not show this information. Do you know any way to get it? > > > > > > > > > > > I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core > > > run. > > > > > > So if the algorithm is able to use 5.4GB/s [or more] for 1 threads, > > > 10.8 [or more] for 2 threads - you would just see scalable performance > > > from 1 to 2, and 3, 4 would perhaps be slightly incremental to the > > > 2-core performance. > > > > > > Satish > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Dec 22 11:53:04 2010 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 22 Dec 2010 11:53:04 -0600 (CST) Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, 22 Dec 2010, Yongjun Chen wrote: > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay wrote: > > > Thanks a lot, Satish. It is much clear now. But for the choice of the two, > the program dmidecode does not show this information. Do you know any way to > get it? why do you expect dmidecode to show that? You'll have to look for the CPU/chipset hardware documentation - and look at the details - and sometimes they mention these details.. Satish From yjxd.chen at gmail.com Wed Dec 22 12:11:12 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Wed, 22 Dec 2010 19:11:12 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay wrote: > On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay wrote: > > > > > Thanks a lot, Satish. It is much clear now. But for the choice of the > two, > > the program dmidecode does not show this information. Do you know any way > to > > get it? > > why do you expect dmidecode to show that? > > You'll have to look for the CPU/chipset hardware documentation - and > look at the details - and sometimes they mention these details.. > > Satish > Thanks, Satish. Yes, I need to check it. Just now I re-configured PETSC with the option --with-device=ch3:nemsis. The results are almost the same as --with-device=ch3:sock. As can be seen in the attachment. I hope the matrix partitioning - reordering algorithm would have some positive effects. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Process 0 of total 8 on wmss04 Process 4 of total 8 on wmss04 Process 1 of total 8 on wmss04 Process 5 of total 8 on wmss04 Process 6 of total 8 on wmss04 Process 2 of total 8 on wmss04 Process 3 of total 8 on wmss04 Process 7 of total 8 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 17:41:47 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.32502e-06 Norm of error 1.32502e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 333.681 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 17:47:21 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 18:47:21 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.558e+02 1.00000 3.558e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 Flops/sec: 2.190e+08 1.09702 2.140e+08 1.712e+09 MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.5581e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.5404e+02 1.6 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 1876 MatMultTranspose 1473 1.0 1.4721e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 1962 MatAssemblyBegin 1 1.0 6.0289e-0316.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.2618e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.0790e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0855e+0112.8 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 9.9344e+0120.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 12 1 0 0 66 12 1 0 0 66 70 VecNorm 1475 1.0 5.6723e+00 2.9 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 613 VecCopy 4 1.0 5.5063e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.1978e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 8.6108e+00 1.3 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1209 VecAYPX 2944 1.0 6.0635e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1144 VecAssemblyBegin 6 1.0 4.8455e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 3.5286e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 8.7080e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 399 VecScatterBegin 2947 1.0 1.8601e+00 2.6 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2947 1.0 9.0296e+0116.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 KSPSetup 1 1.0 9.8538e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.2263e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 91100100100 99 91100100100 99 1887 PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 8.7381e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 397 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 84944064 0 Vec 18 18 15741712 0 Vec Scatter 2 2 1736 0 Index Set 4 4 409008 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 4.98295e-06 Average time for MPI_Barrier(): 9.76086e-05 Average time for zero size MPI_Send(): 2.81334e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 18:24:43 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3nemsis ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 12 on wmss04 Process 4 of total 12 on wmss04 Process 6 of total 12 on wmss04 Process 5 of total 12 on wmss04Process 11 of total 12 on wmss04 Process 2 of total 12 on wmss04 Process 7 of total 12 on wmss04 Process 3 of total 12 on wmss04 Process 8 of total 12 on wmss04 Process 1 of total 12 on wmss04 Process 9 of total 12 on wmss04 Process 10 of total 12 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 17:55:12 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28414e-06 Norm of error 1.28414e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 241.392 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 17:59:13 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 18:59:13 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 2.594e+02 1.00000 2.594e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 Flops/sec: 2.004e+08 1.11689 1.956e+08 2.348e+09 MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.5935e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.1203e+02 1.5 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 2579 MatMultTranspose 1473 1.0 9.9342e+01 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2906 MatAssemblyBegin 1 1.0 3.7930e-03 8.9 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.1536e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.2507e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.2744e+0166.4 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 5.4256e+0115.3 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 6 1 0 0 66 6 1 0 0 66 128 VecNorm 1475 1.0 7.3386e+00 5.2 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 473 VecCopy 4 1.0 6.2873e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.5036e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 7.4288e+00 1.8 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1401 VecAYPX 2944 1.0 5.0487e+00 2.5 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1374 VecAssemblyBegin 6 1.0 3.4969e-0211.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 5.5075e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 7.2035e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 482 VecScatterBegin 2947 1.0 2.5759e+00 2.7 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 5.1555e+0111.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 KSPSetup 1 1.0 8.2631e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.2851e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 88100100100 99 88100100100 99 2664 PCSetUp 1 1.0 7.1526e-06 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 7.2339e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 480 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 56593044 0 Vec 18 18 10534536 0 Vec Scatter 2 2 1736 0 Index Set 4 4 305424 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 7.82013e-06 Average time for MPI_Barrier(): 9.52244e-05 Average time for zero size MPI_Send(): 2.15769e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 18:24:43 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3nemsis ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 16 on wmss04 Process 8 of total 16 on wmss04 Process 4 of total 16 on wmss04 Process 6 of total 16 on wmss04 Process 14 of total 16 on wmss04 Process 12 of total 16 on wmss04 Process 2 of total 16 on wmss04 Process 10 of total 16 on wmss04 Process Process 3 of total 16 on wmss04 Process 15 of total 16 on wmss04 7 of total 16 on wmss04Process 1 of total 16 on wmss04 Process 9 of total 16 on wmss04 Process 5 of total 16 on wmss04 Process 13 of total 16 on wmss04 The dimension of Matrix A is n = 1177754 Process 11 of total 16 on wmss04 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly:Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly.End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 17:50:47 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.23596e-06 Norm of error 1.23596e-06, Iterations 1481 ========================================================= The solver has finished successfully! ========================================================= The solving time is 227.888 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 17:54:35 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 18:54:35 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 2.442e+02 1.00001 2.442e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.922e+10 1.13060 3.822e+10 6.116e+11 Flops/sec: 1.606e+08 1.13060 1.565e+08 2.504e+09 MPI Messages: 1.187e+04 3.99916 7.051e+03 1.128e+05 MPI Message Lengths: 1.929e+09 7.80850 1.819e+05 2.052e+10 MPI Reductions: 4.501e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.4422e+02 100.0% 6.1159e+11 100.0% 1.128e+05 100.0% 1.819e+05 100.0% 4.485e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1482 1.0 1.1549e+02 2.0 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2513 MatMultTranspose 1481 1.0 9.3652e+01 1.4 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 32 47 50 50 0 32 47 50 50 0 3097 MatAssemblyBegin 1 1.0 4.6110e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.1871e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 5.1212e-04 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.2031e+01123.8 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2962 1.0 7.2313e+0122.5 4.36e+08 1.0 0.0e+00 0.0e+00 3.0e+03 13 1 0 0 66 13 1 0 0 66 96 VecNorm 1483 1.0 5.2508e+00 4.6 2.18e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 665 VecCopy 4 1.0 3.2623e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8891 1.0 2.5386e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4444 1.0 6.6341e+00 1.6 6.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1578 VecAYPX 2960 1.0 4.2830e+00 1.7 4.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1628 VecAssemblyBegin 6 1.0 4.0186e-0213.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 6.0081e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2964 1.0 6.2569e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 558 VecScatterBegin 2963 1.0 2.9219e+00 4.0 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2963 1.0 5.0568e+01 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 5.8019e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.1573e+02 1.0 3.92e+10 1.1 1.1e+05 1.8e+05 4.4e+03 88100100100 99 88100100100 99 2834 PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2964 1.0 6.2830e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 556 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 42424600 0 Vec 18 18 7924896 0 Vec Scatter 2 2 1736 0 Index Set 4 4 247632 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.38998e-05 Average time for MPI_Barrier(): 0.00011363 Average time for zero size MPI_Send(): 2.03103e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 18:24:43 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3nemsis ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ From jed at 59A2.org Wed Dec 22 16:08:37 2010 From: jed at 59A2.org (Jed Brown) Date: Wed, 22 Dec 2010 23:08:37 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 18:53, Satish Balay wrote: > You'll have to look for the CPU/chipset hardware documentation - and > look at the details - and sometimes they mention these details.. > hwloc shows the cache hierarchy. http://www.open-mpi.org/projects/hwloc/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Dec 22 18:43:30 2010 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Dec 2010 16:43:30 -0800 Subject: [petsc-users] ParMETIS question In-Reply-To: <4D1204A9.7090209@tu-dresden.de> References: <4D10B086.2090503@tu-dresden.de> <4D11BFCA.2030800@tu-dresden.de> <4D1204A9.7090209@tu-dresden.de> Message-ID: On Wed, Dec 22, 2010 at 6:01 AM, Thomas Witkowski < thomas.witkowski at tu-dresden.de> wrote: > So, I found the problem related to empty partitions. It is not possible to > weight vertices (i.e. elements of a mesh) in such a way that one weight is > much higher than the other ones. For more details see > > http://glaros.dtc.umn.edu/flyspray/task/11 > > Its a pity that ParMetis makes is very hard to find this kind of errors. > > The open question for me is about the non continuous partitions. Is it a > normal behavior of ParMetis to create partitions that are not continous? Yes, this is normal. Matt > > Thomas > > > Thomas Witkowski wrote: > >> Okay, in my computations, I have empty partitions on some ranks and >> definitely not >> minimal boundary sizes. So may be I generate a wrong input. But if this is >> the case, I >> wonder why the resulting mesh partitioning is quite good. If I neglect the >> problem of >> empty partitions, the redistributed mesh leads to a very good load >> balancing. Is there >> any meaningful way to debug the problem? Is there something link a >> "verbose mode" in >> ParMetis that says me whats happen on the input data? Otherwise I have to >> print all the >> input data to the screen and check it by hand. Although I have a quite >> small example with >> 128 overall coarse mesh elements on 8 ranks, this is not big fun :) >> >> Thomas >> >> @Matthew: By mistake I've answered your mail directly to you and not to >> the mailing list, therefore I sent it now here again >> >> Matthew Knepley wrote: >> >>> On Tue, Dec 21, 2010 at 5:49 AM, Thomas Witkowski < >>> thomas.witkowski at tu-dresden.de > >>> wrote: >>> >>> Hi, >>> >>> I have a not directly PETSc related question, but I hope to get >>> some answer from the community here. In my FEM code, I make use of >>> ParMETIS to partition the mesh. I make direct use of this library >>> and not of PETSc's ParMETIS integration. The initial partition is >>> always fine, but I use the ParMETIS_V3_AdaptiveRepart function for >>> repartition the mesh due to local mesh adaption. In most cases, >>> the result is fine, but there are two points, where I have trouble >>> with: >>> >>> 1) Sometimes ParMETIS generates empty partitions, i.e., a >>> processor has zero mesh elements. This is something my code cannot >>> handle. Is this a bug or a feature? If it is a feature, is there >>> any possiblity to disable it? >>> >>> >>> ParMetis has a balance constraint if you weight vertices. This will >>> enforce equal size partitions. >>> >>> 2) In most cases the specific partitions are not connected. If I >>> put all data to ParMETIS in a correct way, is this okay? My code >>> can handle it, but is slows down the computation due to larger >>> interior boundaries and therefore to more communications. >>> >>> >>> ParMetis minimizes the overall boundary size, so I do not understand how >>> you could see this slowdown. >>> >>> Matt >>> >>> Does anyone of you know an answer to these question? Is there a >>> debug mode in ParMETIS, where I can see which data is set to its >>> function calls? >>> >>> Regards, >>> >>> Thomas >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Dec 22 19:03:52 2010 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Dec 2010 17:03:52 -0800 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen wrote: > > > On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay wrote: > >> On Wed, 22 Dec 2010, Yongjun Chen wrote: >> >> > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay >> wrote: >> > >> > > Thanks a lot, Satish. It is much clear now. But for the choice of the >> two, >> > the program dmidecode does not show this information. Do you know any >> way to >> > get it? >> >> why do you expect dmidecode to show that? >> >> You'll have to look for the CPU/chipset hardware documentation - and >> look at the details - and sometimes they mention these details.. >> >> Satish >> > > > Thanks, Satish. Yes, I need to check it. > Just now I re-configured PETSC with the option --with-device=ch3:nemsis. > The results are almost the same as --with-device=ch3:sock. As can be seen in > the attachment. > I hope the matrix partitioning - reordering algorithm would have some > positive effects. > 1) To see a large gain, the ordering you start with would have to be very bad. Maybe it is. These orderings try to minimize bandwidth, which means minimize communication in the MatMult. 2) If you use incomplete facotrization, the ordering can have a large effect on conditioning, so number of iterations, which does not improve scalability. This would impact scalability if you use a parallel IC, however all those packages reorder your matrix already. In short, I suspect this will not help a lot, except maybe with conditioning, which is what I was refering to in the quote. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Dec 22 20:32:00 2010 From: jed at 59A2.org (Jed Brown) Date: Thu, 23 Dec 2010 03:32:00 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: I disagree, there is easily a factor of two in flop/s between a naive ordering (e.g. hierarchical by node type in a finite element method) and a good low-bandwidth ordering. This is in the FUN3D papers and still true today, in my experience. Incomplete factorization is also very order dependent, as you note. Jed On Dec 22, 2010 5:03 PM, "Matthew Knepley" wrote: On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen wrote: > > > > On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay wrote: >> >> On Wed, 22 De... 1) To see a large gain, the ordering you start with would have to be very bad. Maybe it is. These orderings try to minimize bandwidth, which means minimize communication in the MatMult. 2) If you use incomplete facotrization, the ordering can have a large effect on conditioning, so number of iterations, which does not improve scalability. This would impact scalability if you use a parallel IC, however all those packages reorder your matrix already. In short, I suspect this will not help a lot, except maybe with conditioning, which is what I was refering to in the quote. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more... -------------- next part -------------- An HTML attachment was scrubbed... URL: From yjxd.chen at gmail.com Thu Dec 23 02:28:48 2010 From: yjxd.chen at gmail.com (Yongjun Chen) Date: Thu, 23 Dec 2010 09:28:48 +0100 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: Matt, Jed, thanks a lot for the discussions. Since the ordering could minimizing the bandwidth, I think it is really worth to have a try with the matrix partitioning / ordering. If there is a factor two of increase in the flop rate, that's quite promising! On Thu, Dec 23, 2010 at 3:32 AM, Jed Brown wrote: > I disagree, there is easily a factor of two in flop/s between a naive > ordering (e.g. hierarchical by node type in a finite element method) and a > good low-bandwidth ordering. > > This is in the FUN3D papers and still true today, in my experience. > > Incomplete factorization is also very order dependent, as you note. > > Jed > > On Dec 22, 2010 5:03 PM, "Matthew Knepley" wrote: > > On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen > wrote: > > > > > On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay wrote: > >> > >> On Wed, 22 De... > 1) To see a large gain, the ordering you start with would have to be very > bad. Maybe it is. These > orderings try to minimize bandwidth, which means minimize communication > in the MatMult. > > 2) If you use incomplete facotrization, the ordering can have a large > effect on conditioning, so > number of iterations, which does not improve scalability. This would > impact scalability if you > use a parallel IC, however all those packages reorder your matrix > already. > > In short, I suspect this will not help a lot, except maybe with > conditioning, which is what I was refering to in the quote. > > Matt > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more... > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Dec 23 19:45:04 2010 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 23 Dec 2010 17:45:04 -0800 Subject: [petsc-users] Very poor speed up performance In-Reply-To: References: <892FB8CF-27B2-4366-9927-31A0FC062E63@mcs.anl.gov> Message-ID: On Wed, Dec 22, 2010 at 6:32 PM, Jed Brown wrote: > I disagree, there is easily a factor of two in flop/s between a naive > ordering (e.g. hierarchical by node type in a finite element method) and a > good low-bandwidth ordering. > > This is in the FUN3D papers and still true today, in my experience. > There is no doubt that this difference can exist, but many mesh generators (such as triangle) give back a good ordering. FUN3D used an inexplicably bad ordering. Matt > Incomplete factorization is also very order dependent, as you note. > > Jed > > On Dec 22, 2010 5:03 PM, "Matthew Knepley" wrote: > > On Wed, Dec 22, 2010 at 10:11 AM, Yongjun Chen > wrote: > > > > > > > > > On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay wrote: > >> > >> On Wed, 22 De... > > 1) To see a large gain, the ordering you start with would have to be very > bad. Maybe it is. These > orderings try to minimize bandwidth, which means minimize communication > in the MatMult. > > 2) If you use incomplete facotrization, the ordering can have a large > effect on conditioning, so > number of iterations, which does not improve scalability. This would > impact scalability if you > use a parallel IC, however all those packages reorder your matrix > already. > > In short, I suspect this will not help a lot, except maybe with > conditioning, which is what I was refering to in the quote. > > Matt > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more... > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Dec 25 22:17:44 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 25 Dec 2010 22:17:44 -0600 Subject: [petsc-users] Using PETSc from MATLAB code, experimental Message-ID: <28CA197A-02C8-41FD-BF2D-E344FBC3CF97@mcs.anl.gov> PETSc users, It is now possible to write MATLAB programs (sequential) that use PETSc KSP, SNES, and TS solvers directly in MATLAB. The code is still experimental and incomplete. But if you are interested in trying it out, get the development release of PETSc http://www.mcs.anl.gov/petsc/petsc-as/developers/index.html join the development mailing list petsc-dev http://www.mcs.anl.gov/petsc/petsc-as/miscellaneous/mailing-lists.html, read bin/matlab/classes/PetscInitialize.m, configure and make PETSc and join the fun. We are definitely in need of more developers for this code. Barry From PRaeth at hpti.com Mon Dec 27 09:09:18 2010 From: PRaeth at hpti.com (Raeth, Peter) Date: Mon, 27 Dec 2010 15:09:18 +0000 Subject: [petsc-users] Meaning of MatStencil Message-ID: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM> Am a new PETSc user trying to make use of the matrix level to create and operate on matrices whose memory exceeds that available on any one node. To populate a distributed dense matrix with results of other matrix calculations we are trying to use MatSetValuesBlockedStencil. Two of the inputs to that function require structures of type MatStencil. After searching the archives, tutorials, examples, and Google, I can not find anything that explains what the values of MatStencil are meant to do relative to where in the target matrix to place the block of values. Would someone please point me in the right direction? Thanks, Peter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron.ahmadia at kaust.edu.sa Mon Dec 27 09:44:50 2010 From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia) Date: Mon, 27 Dec 2010 10:44:50 -0500 Subject: [petsc-users] Meaning of MatStencil In-Reply-To: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM> References: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM> Message-ID: MatStencil only makes sense if you are using a distributed grid (DA), where it corresponds to physical field locations. You probably just want MatSetValuesBlocked ( http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html ) Warm Regards, Aron On Mon, Dec 27, 2010 at 10:09 AM, Raeth, Peter wrote: > Am a new PETSc user trying to make use of the matrix level to create > and operate on matrices whose memory exceeds that available on any one node. > To populate a distributed dense matrix with results of other matrix > calculations we are trying to use MatSetValuesBlockedStencil. Two of the > inputs to that function require structures of type MatStencil. After > searching the archives, tutorials, examples, and Google, I can not find > anything that explains what the values of MatStencil are meant to do > relative to where in the target matrix to place the block of values. > Would someone please point me in the right direction? > > Thanks, > > Peter. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From PRaeth at hpti.com Mon Dec 27 09:47:25 2010 From: PRaeth at hpti.com (Raeth, Peter) Date: Mon, 27 Dec 2010 15:47:25 +0000 Subject: [petsc-users] Meaning of MatStencil In-Reply-To: References: <3474F869C1954540B771FD9CAEBCB65704A9A833@CORTINA.HPTI.COM>, Message-ID: <3474F869C1954540B771FD9CAEBCB65704A9A859@CORTINA.HPTI.COM> AH ! Thanks very much Aron. Best, Peter. ________________________________ From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Aron Ahmadia [aron.ahmadia at kaust.edu.sa] Sent: Monday, December 27, 2010 10:44 AM To: PETSc users list Subject: Re: [petsc-users] Meaning of MatStencil MatStencil only makes sense if you are using a distributed grid (DA), where it corresponds to physical field locations. You probably just want MatSetValuesBlocked (http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html) Warm Regards, Aron On Mon, Dec 27, 2010 at 10:09 AM, Raeth, Peter > wrote: Am a new PETSc user trying to make use of the matrix level to create and operate on matrices whose memory exceeds that available on any one node. To populate a distributed dense matrix with results of other matrix calculations we are trying to use MatSetValuesBlockedStencil. Two of the inputs to that function require structures of type MatStencil. After searching the archives, tutorials, examples, and Google, I can not find anything that explains what the values of MatStencil are meant to do relative to where in the target matrix to place the block of values. Would someone please point me in the right direction? Thanks, Peter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdiso at ustc.edu Wed Dec 29 23:00:19 2010 From: gdiso at ustc.edu (Gong Ding) Date: Thu, 30 Dec 2010 13:00:19 +0800 Subject: [petsc-users] pastix solver break at pastix_checkMatrix Message-ID: Dear all, I found the pastix solver petsc-3.1-p4 ./configure --with-vendor-compilers=intel --with-blas-lapack-dir=/opt/intel/mkl/10.2.0.013/lib/em64t/ --download-pastix --download-scotch --with-shared=1 --with-debugging=1 For a poisson problem with symmetric matrix, pastix works well. However, for unsymmetric problem, the code break. valgrind reported that: Check : Sort CSC OK ==4959== Invalid read of size 4 ==4959== at 0x1241931: PetscTrFreeDefault (mtr.c:280) ==4959== by 0x150448A: MatConvertToCSC (pastix.c:188) ==4959== by 0x1506638: MatFactorNumeric_PaStiX (pastix.c:390) ==4959== by 0x13980AB: MatLUFactorNumeric (matrix.c:2587) ==4959== by 0x16AB0A6: PCSetUp_LU (lu.c:158) ==4959== by 0x1A9BD42: PCSetUp (precon.c:795) ==4959== by 0x16FA8D0: KSPSetUp (itfunc.c:237) ==4959== by 0x16FBB2A: KSPSolve (itfunc.c:353) ==4959== by 0x17BEC6D: SNES_KSPSolve (snes.c:2944) ==4959== by 0x17CEFEA: SNESSolve_LS (ls.c:191) ==4959== by 0x17B8B78: SNESSolve (snes.c:2255) ==4959== by 0x10B969D: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:820) ==4959== Address 0x88e2f08 is 8 bytes inside a block of size 40 free'd ==4959== at 0x4A05B16: operator delete(void*) (vg_replace_malloc.c:387) ==4959== by 0xA42A53: __gnu_cxx::new_allocator >::deallocate(std::_Rb_tree_node*, unsigned long) (new_allocator.h:94) ==4959== by 0xA41863: std::_Rb_tree, std::less, std::allocator >::_M_put_node(std::_Rb_tree_node*) (stl_tree.h:362) ==4959== by 0xA419C8: std::_Rb_tree, std::less, std::allocator >::destroy_node(std::_Rb_tree_node*) (stl_tree.h:392) ==4959== by 0xA42501: std::_Rb_tree, std::less, std::allocator >::erase(std::_Rb_tree_iterator) (stl_tree.h:1189) ==4959== by 0xA4264A: std::_Rb_tree, std::less, std::allocator >::erase(std::_Rb_tree_iterator, std::_Rb_tree_iterator) (stl_tree.h:1281) ==4959== by 0xA4257B: std::_Rb_tree, std::less, std::allocator >::erase(CTRI::Triangle* const&) (stl_tree.h:1215) ==4959== by 0xA415E2: std::set, std::allocator >::erase(CTRI::Triangle* const&) (stl_set.h:387) ==4959== by 0xA3F6BD: CTRI::Triangle::~Triangle() (c_triangle.cc:109) ==4959== by 0xA43F16: CTRI::TriMesh::~TriMesh() (c_trimesh.cc:163) ==4959== by 0xA3F0C6: ctri_triangulate (c_tri_io.cc:35) ==4959== by 0xD53FB9: MeshGeneratorTri3::do_refine(MeshRefinement&) (mesh_generation_tri3.cc:1369) ==4959== [0]PETSC ERROR: PetscTrFreeDefault() called from MatConvertToCSC() line 188 in src/mat/impls/aij/mpi/pastix/pastix.c [0]PETSC ERROR: Block at address 0x88e2ee0 is corrupted; cannot free; may be block not allocated with PetscMalloc() The test problems used to work well under other linear solvers such as MUMPS and superlu. Does any meet this problem before? Yours Gong Ding From PRaeth at hpti.com Thu Dec 30 06:51:58 2010 From: PRaeth at hpti.com (Raeth, Peter) Date: Thu, 30 Dec 2010 12:51:58 +0000 Subject: [petsc-users] pastix solver break at pastix_checkMatrix In-Reply-To: References: Message-ID: <3474F869C1954540B771FD9CAEBCB65704A9AA61@CORTINA.HPTI.COM> Good morning Ding. Sorry to not have an answer since I do not try to solve that type of problem. But, are the matrices formed and populated in a way that is appropriate to solution with PETSc? If that is the case then there does appear to be an issue with PETSc itself. I'll bet the developers would love to have your example. My own development exercise is to compute the Kronecker tensor product where all components are distributed. Am using the Matrix and Vector level of PETSc, with some use of higher levels. It is a most interesting undertaking. Best, Peter. From bsmith at mcs.anl.gov Thu Dec 30 08:55:08 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 30 Dec 2010 08:55:08 -0600 Subject: [petsc-users] pastix solver break at pastix_checkMatrix In-Reply-To: References: Message-ID: Please run program with -ksp_view_binary and send us the file binaryoutput to petsc-maint at mcs.anl.gov so we can reproduce the problem. Barry On Dec 29, 2010, at 11:00 PM, Gong Ding wrote: > Dear all, > > I found the pastix solver > > petsc-3.1-p4 > ./configure --with-vendor-compilers=intel --with-blas-lapack-dir=/opt/intel/mkl/10.2.0.013/lib/em64t/ --download-pastix --download-scotch --with-shared=1 --with-debugging=1 > > For a poisson problem with symmetric matrix, pastix works well. > However, for unsymmetric problem, the code break. valgrind reported that: > Check : Sort CSC OK > ==4959== Invalid read of size 4 > ==4959== at 0x1241931: PetscTrFreeDefault (mtr.c:280) > ==4959== by 0x150448A: MatConvertToCSC (pastix.c:188) > ==4959== by 0x1506638: MatFactorNumeric_PaStiX (pastix.c:390) > ==4959== by 0x13980AB: MatLUFactorNumeric (matrix.c:2587) > ==4959== by 0x16AB0A6: PCSetUp_LU (lu.c:158) > ==4959== by 0x1A9BD42: PCSetUp (precon.c:795) > ==4959== by 0x16FA8D0: KSPSetUp (itfunc.c:237) > ==4959== by 0x16FBB2A: KSPSolve (itfunc.c:353) > ==4959== by 0x17BEC6D: SNES_KSPSolve (snes.c:2944) > ==4959== by 0x17CEFEA: SNESSolve_LS (ls.c:191) > ==4959== by 0x17B8B78: SNESSolve (snes.c:2255) > ==4959== by 0x10B969D: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:820) > ==4959== Address 0x88e2f08 is 8 bytes inside a block of size 40 free'd > ==4959== at 0x4A05B16: operator delete(void*) (vg_replace_malloc.c:387) > ==4959== by 0xA42A53: __gnu_cxx::new_allocator >::deallocate(std::_Rb_tree_node*, unsigned long) (new_allocator.h:94) > ==4959== by 0xA41863: std::_Rb_tree, std::less, std::allocator >::_M_put_node(std::_Rb_tree_node*) (stl_tree.h:362) > ==4959== by 0xA419C8: std::_Rb_tree, std::less, std::allocator >::destroy_node(std::_Rb_tree_node*) (stl_tree.h:392) > ==4959== by 0xA42501: std::_Rb_tree, std::less, std::allocator >::erase(std::_Rb_tree_iterator) (stl_tree.h:1189) > ==4959== by 0xA4264A: std::_Rb_tree, std::less, std::allocator >::erase(std::_Rb_tree_iterator, std::_Rb_tree_iterator) (stl_tree.h:1281) > ==4959== by 0xA4257B: std::_Rb_tree, std::less, std::allocator >::erase(CTRI::Triangle* const&) (stl_tree.h:1215) > ==4959== by 0xA415E2: std::set, std::allocator >::erase(CTRI::Triangle* const&) (stl_set.h:387) > ==4959== by 0xA3F6BD: CTRI::Triangle::~Triangle() (c_triangle.cc:109) > ==4959== by 0xA43F16: CTRI::TriMesh::~TriMesh() (c_trimesh.cc:163) > ==4959== by 0xA3F0C6: ctri_triangulate (c_tri_io.cc:35) > ==4959== by 0xD53FB9: MeshGeneratorTri3::do_refine(MeshRefinement&) (mesh_generation_tri3.cc:1369) > ==4959== > [0]PETSC ERROR: PetscTrFreeDefault() called from MatConvertToCSC() line 188 in src/mat/impls/aij/mpi/pastix/pastix.c > [0]PETSC ERROR: Block at address 0x88e2ee0 is corrupted; cannot free; > may be block not allocated with PetscMalloc() > > The test problems used to work well under other linear solvers such as MUMPS and superlu. > Does any meet this problem before? > > Yours > Gong Ding > From gdiso at ustc.edu Thu Dec 30 20:44:20 2010 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 31 Dec 2010 10:44:20 +0800 Subject: [petsc-users] pastix solver break at pastix_checkMatrix References: Message-ID: <3D98059130724B9FA0F5D2BB1F21C849@cogendaeda> ----- Original Message ----- From: "Barry Smith" To: "PETSc users list" Sent: Thursday, December 30, 2010 10:55 PM Subject: Re: [petsc-users] pastix solver break at pastix_checkMatrix Please run program with -ksp_view_binary and send us the file binaryoutput to petsc-maint at mcs.anl.gov so we can reproduce the problem. Barry Ok, I use pastix as linear solver of SNES. (The coding style of pastix seems much better than mumps. I'd like to try if I can speed it by GPU-based BLAS.) The settings are listed as follows: ierr = KSPSetType (ksp, (char*) KSPPREONLY); assert(!ierr); ierr = PCSetType (pc, (char*) PCLU); assert(!ierr); ierr = PCFactorSetMatSolverPackage (pc, MAT_SOLVER_PASTIX); assert(!ierr); ierr = PCFactorSetReuseFill(pc, PETSC_TRUE);assert(!ierr); ierr = PCFactorSetReuseOrdering(pc, PETSC_TRUE); assert(!ierr); ierr = PCFactorSetColumnPivot(pc, 1.0); genius_assert(!ierr); //ierr = PCFactorReorderForNonzeroDiagonal(pc, 1e-20); assert(!ierr);<-- Caught signal number 11 SEGV error will occure when diag value < 1e-20 ierr = PCFactorSetShiftType(pc,MAT_SHIFT_NONZERO);assert(!ierr); The attached matrix is dumped at the end of jacobian assemble. -------------- next part -------------- A non-text attachment was scrubbed... Name: break.mat Type: application/octet-stream Size: 208768 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Dec 30 20:45:25 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 30 Dec 2010 20:45:25 -0600 Subject: [petsc-users] pastix solver break at pastix_checkMatrix In-Reply-To: References: Message-ID: Yes, there is an error in how we called one of the pastix routines. It required certain arrays be obtained with a raw malloc(). Please find attached a new src/mat/impls/aij/mpi/pastix/pastix.c put it in that location and run make in that directory. Thanks for reporting the problem and sending the valgrind output Barry On Dec 29, 2010, at 11:00 PM, Gong Ding wrote: > Dear all, > > I found the pastix solver > > petsc-3.1-p4 > ./configure --with-vendor-compilers=intel --with-blas-lapack-dir=/opt/intel/mkl/10.2.0.013/lib/em64t/ --download-pastix --download-scotch --with-shared=1 --with-debugging=1 > > For a poisson problem with symmetric matrix, pastix works well. > However, for unsymmetric problem, the code break. valgrind reported that: > Check : Sort CSC OK > ==4959== Invalid read of size 4 > ==4959== at 0x1241931: PetscTrFreeDefault (mtr.c:280) > ==4959== by 0x150448A: MatConvertToCSC (pastix.c:188) > ==4959== by 0x1506638: MatFactorNumeric_PaStiX (pastix.c:390) > ==4959== by 0x13980AB: MatLUFactorNumeric (matrix.c:2587) > ==4959== by 0x16AB0A6: PCSetUp_LU (lu.c:158) > ==4959== by 0x1A9BD42: PCSetUp (precon.c:795) > ==4959== by 0x16FA8D0: KSPSetUp (itfunc.c:237) > ==4959== by 0x16FBB2A: KSPSolve (itfunc.c:353) > ==4959== by 0x17BEC6D: SNES_KSPSolve (snes.c:2944) > ==4959== by 0x17CEFEA: SNESSolve_LS (ls.c:191) > ==4959== by 0x17B8B78: SNESSolve (snes.c:2255) > ==4959== by 0x10B969D: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:820) > ==4959== Address 0x88e2f08 is 8 bytes inside a block of size 40 free'd > ==4959== at 0x4A05B16: operator delete(void*) (vg_replace_malloc.c:387) > ==4959== by 0xA42A53: __gnu_cxx::new_allocator >::deallocate(std::_Rb_tree_node*, unsigned long) (new_allocator.h:94) > ==4959== by 0xA41863: std::_Rb_tree, std::less, std::allocator >::_M_put_node(std::_Rb_tree_node*) (stl_tree.h:362) > ==4959== by 0xA419C8: std::_Rb_tree, std::less, std::allocator >::destroy_node(std::_Rb_tree_node*) (stl_tree.h:392) > ==4959== by 0xA42501: std::_Rb_tree, std::less, std::allocator >::erase(std::_Rb_tree_iterator) (stl_tree.h:1189) > ==4959== by 0xA4264A: std::_Rb_tree, std::less, std::allocator >::erase(std::_Rb_tree_iterator, std::_Rb_tree_iterator) (stl_tree.h:1281) > ==4959== by 0xA4257B: std::_Rb_tree, std::less, std::allocator >::erase(CTRI::Triangle* const&) (stl_tree.h:1215) > ==4959== by 0xA415E2: std::set, std::allocator >::erase(CTRI::Triangle* const&) (stl_set.h:387) > ==4959== by 0xA3F6BD: CTRI::Triangle::~Triangle() (c_triangle.cc:109) > ==4959== by 0xA43F16: CTRI::TriMesh::~TriMesh() (c_trimesh.cc:163) > ==4959== by 0xA3F0C6: ctri_triangulate (c_tri_io.cc:35) > ==4959== by 0xD53FB9: MeshGeneratorTri3::do_refine(MeshRefinement&) (mesh_generation_tri3.cc:1369) > ==4959== > [0]PETSC ERROR: PetscTrFreeDefault() called from MatConvertToCSC() line 188 in src/mat/impls/aij/mpi/pastix/pastix.c > [0]PETSC ERROR: Block at address 0x88e2ee0 is corrupted; cannot free; > may be block not allocated with PetscMalloc() > > The test problems used to work well under other linear solvers such as MUMPS and superlu. > Does any meet this problem before? > > Yours > Gong Ding > -------------- next part -------------- A non-text attachment was scrubbed... Name: pastix.c Type: application/octet-stream Size: 27993 bytes Desc: not available URL: From gdiso at ustc.edu Thu Dec 30 21:49:01 2010 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 31 Dec 2010 11:49:01 +0800 Subject: [petsc-users] pastix solver break at pastix_checkMatrix References: Message-ID: <54B6E01485354315BF5C9040319E4F0C@cogendaeda> Dear Barry, First, the patched file has some evident problem. PetscScalar *tmpvalues; PetscInt *tmprows,*tmpcolptr; tmpvalues = malloc(nnz*sizeof(PetscScalar)); tmprows = malloc(nnz*sizeof(PetscInt)); tmpcolptr = malloc((*n+1)*sizeof(PetscInt)); ierr = PetscMalloc3(nnz,PetscScalar,&tmpvalues,nnz,PetscInt,&tmprows,(*n+1),PetscInt,&tmpcolptr);CHKERRQ(ierr); <-- this line alloc meory again. After comment above line, the pastix works for the first nonlinear iteration. However, it breaks at the second iteration. valgrind reported: DDM Solver Level 1 init... Using PaStiX linear solver... Compute equilibrium its | Eq(V) | | Eq(n) | | Eq(p) | | Eq(T) | |Eq(Tn)| |Eq(Tp)| |delta x| ----------------------------------------------------------------------------- 0 2.50e-06 2.34e-03 3.12e-04 0.00e+00* 0.00e+00* 0.00e+00* 0.00e+00* Check : ordering OK Check : Graph Symmetry Correction Add 4090 null terms OK Check : Sort CSC OK 1 2.06e-05 7.29e-04 1.03e-04 0.00e+00* 0.00e+00* 0.00e+00* 3.85e-01 Check : ordering OK Check : Graph Symmetry==1416== Thread 1: ==1416== Invalid read of size 4 ==1416== at 0x1BC3186: csc_checksym (csc_utils.c:321) ==1416== by 0x1B4E7E3: pastix_checkMatrix (pastix.c:3915) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd ==1416== Correction==1416== Invalid read of size 4 ==1416== at 0x1BD1147: correct2 (cscsymcsc.c:77) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd ==1416== ==1416== Invalid read of size 4 ==1416== at 0x1BD10D7: correct2 (cscsymcsc.c:67) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) ==1416== ==1416== Invalid read of size 4 ==1416== at 0x1BD10EB: correct2 (cscsymcsc.c:72) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) ==1416== ==1416== Invalid read of size 4 ==1416== at 0x1BD1102: correct2 (cscsymcsc.c:75) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8d369dc is 4 bytes before a block of size 4,216 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x150812B: MatConvertToCSC (pastix.c:169) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) ==1416== ==1416== Invalid read of size 4 ==1416== at 0x1BD116D: correct2 (cscsymcsc.c:88) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) ==1416== ==1416== Invalid read of size 4 ==1416== at 0x1BD1179: correct2 (cscsymcsc.c:88) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x1BD1029: correct2 (cscsymcsc.c:53) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== ==1416== Invalid read of size 4 ==1416== at 0x1BD1186: correct2 (cscsymcsc.c:88) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) ==1416== ==1416== Invalid write of size 4 ==1416== at 0x1BD1192: correct2 (cscsymcsc.c:88) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) ==1416== Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) ==1416== by 0x1BD1029: correct2 (cscsymcsc.c:53) ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) ==1416== by 0x1AA0136: PCSetUp (precon.c:795) ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) ==1416== [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below From bsmith at mcs.anl.gov Thu Dec 30 22:01:05 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 30 Dec 2010 22:01:05 -0600 Subject: [petsc-users] pastix solver break at pastix_checkMatrix In-Reply-To: <54B6E01485354315BF5C9040319E4F0C@cogendaeda> References: <54B6E01485354315BF5C9040319E4F0C@cogendaeda> Message-ID: <2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov> Well that, obviously was erroneously left in. Remove that line, recompile in that directory and it should run ok. Barry On Dec 30, 2010, at 9:49 PM, Gong Ding wrote: > Dear Barry, > First, the patched file has some evident problem. > > PetscScalar *tmpvalues; > PetscInt *tmprows,*tmpcolptr; > tmpvalues = malloc(nnz*sizeof(PetscScalar)); > tmprows = malloc(nnz*sizeof(PetscInt)); > tmpcolptr = malloc((*n+1)*sizeof(PetscInt)); > > ierr = PetscMalloc3(nnz,PetscScalar,&tmpvalues,nnz,PetscInt,&tmprows,(*n+1),PetscInt,&tmpcolptr);CHKERRQ(ierr); <-- this line alloc meory again. > > After comment above line, the pastix works for the first nonlinear iteration. However, it breaks at the second iteration. valgrind reported: > > DDM Solver Level 1 init... > Using PaStiX linear solver... > Compute equilibrium > its | Eq(V) | | Eq(n) | | Eq(p) | | Eq(T) | |Eq(Tn)| |Eq(Tp)| |delta x| > ----------------------------------------------------------------------------- > 0 2.50e-06 2.34e-03 3.12e-04 0.00e+00* 0.00e+00* 0.00e+00* 0.00e+00* > Check : ordering OK > Check : Graph Symmetry > Correction > Add 4090 null terms > OK > Check : Sort CSC OK > 1 2.06e-05 7.29e-04 1.03e-04 0.00e+00* 0.00e+00* 0.00e+00* 3.85e-01 > Check : ordering OK > Check : Graph Symmetry==1416== Thread 1: > ==1416== Invalid read of size 4 > ==1416== at 0x1BC3186: csc_checksym (csc_utils.c:321) > ==1416== by 0x1B4E7E3: pastix_checkMatrix (pastix.c:3915) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd > ==1416== > > Correction==1416== Invalid read of size 4 > ==1416== at 0x1BD1147: correct2 (cscsymcsc.c:77) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD10D7: correct2 (cscsymcsc.c:67) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD10EB: correct2 (cscsymcsc.c:72) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD1102: correct2 (cscsymcsc.c:75) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8d369dc is 4 bytes before a block of size 4,216 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x150812B: MatConvertToCSC (pastix.c:169) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD116D: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD1179: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x1BD1029: correct2 (cscsymcsc.c:53) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD1186: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid write of size 4 > ==1416== at 0x1BD1192: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x1BD1029: correct2 (cscsymcsc.c:53) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below From gdiso at ustc.edu Thu Dec 30 22:39:52 2010 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 31 Dec 2010 12:39:52 +0800 Subject: [petsc-users] pastix solver break at pastix_checkMatrix References: <54B6E01485354315BF5C9040319E4F0C@cogendaeda> <2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov> Message-ID: Dear Barry As I mentioned, only works for the first nonlinear iteration. Will break for the second! I guess the MatConvertToCSC function should be considered again, i.e. do optimization for nonlinear solver which will call pastix many times. Gong Ding > Well that, obviously was erroneously left in. Remove that line, recompile in that directory and it should run ok. > Barry On Dec 30, 2010, at 9:49 PM, Gong Ding wrote: > Dear Barry, > First, the patched file has some evident problem. > > PetscScalar *tmpvalues; > PetscInt *tmprows,*tmpcolptr; > tmpvalues = malloc(nnz*sizeof(PetscScalar)); > tmprows = malloc(nnz*sizeof(PetscInt)); > tmpcolptr = malloc((*n+1)*sizeof(PetscInt)); > > ierr = PetscMalloc3(nnz,PetscScalar,&tmpvalues,nnz,PetscInt,&tmprows,(*n+1),PetscInt,&tmpcolptr);CHKERRQ(ierr); <-- this line alloc meory again. > > After comment above line, the pastix works for the first nonlinear iteration. However, it breaks at the second iteration. valgrind reported: > > DDM Solver Level 1 init... > Using PaStiX linear solver... > Compute equilibrium > its | Eq(V) | | Eq(n) | | Eq(p) | | Eq(T) | |Eq(Tn)| |Eq(Tp)| |delta x| > ----------------------------------------------------------------------------- > 0 2.50e-06 2.34e-03 3.12e-04 0.00e+00* 0.00e+00* 0.00e+00* 0.00e+00* > Check : ordering OK > Check : Graph Symmetry > Correction > Add 4090 null terms > OK > Check : Sort CSC OK > 1 2.06e-05 7.29e-04 1.03e-04 0.00e+00* 0.00e+00* 0.00e+00* 3.85e-01 > Check : ordering OK > Check : Graph Symmetry==1416== Thread 1: > ==1416== Invalid read of size 4 > ==1416== at 0x1BC3186: csc_checksym (csc_utils.c:321) > ==1416== by 0x1B4E7E3: pastix_checkMatrix (pastix.c:3915) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd > ==1416== > > Correction==1416== Invalid read of size 4 > ==1416== at 0x1BD1147: correct2 (cscsymcsc.c:77) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8aeb1a8 is not stack'd, malloc'd or (recently) free'd > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD10D7: correct2 (cscsymcsc.c:67) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD10EB: correct2 (cscsymcsc.c:72) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD1102: correct2 (cscsymcsc.c:75) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8d369dc is 4 bytes before a block of size 4,216 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x150812B: MatConvertToCSC (pastix.c:169) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD116D: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD1179: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x1BD1029: correct2 (cscsymcsc.c:53) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== > ==1416== Invalid read of size 4 > ==1416== at 0x1BD1186: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8ae7574 is 0 bytes after a block of size 68,180 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x15080FF: MatConvertToCSC (pastix.c:168) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== by 0x10BAB53: FVM_NonlinearSolver::sens_solve() (fvm_nonlinear_solver.cc:824) > ==1416== > ==1416== Invalid write of size 4 > ==1416== at 0x1BD1192: correct2 (cscsymcsc.c:88) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== by 0x17BCF6C: SNESSolve (snes.c:2255) > ==1416== Address 0x8c1b55c is 4 bytes before a block of size 4,212 alloc'd > ==1416== at 0x4A061EF: malloc (vg_replace_malloc.c:236) > ==1416== by 0x1BD1029: correct2 (cscsymcsc.c:53) > ==1416== by 0x1B4E8EA: pastix_checkMatrix (pastix.c:3930) > ==1416== by 0x1508661: MatConvertToCSC (pastix.c:185) > ==1416== by 0x150AA2E: MatFactorNumeric_PaStiX (pastix.c:396) > ==1416== by 0x139C90B: MatLUFactorNumeric (matrix.c:2587) > ==1416== by 0x16AF49A: PCSetUp_LU (lu.c:158) > ==1416== by 0x1AA0136: PCSetUp (precon.c:795) > ==1416== by 0x16FECC4: KSPSetUp (itfunc.c:237) > ==1416== by 0x16FFF1E: KSPSolve (itfunc.c:353) > ==1416== by 0x17C3061: SNES_KSPSolve (snes.c:2944) > ==1416== by 0x17D33DE: SNESSolve_LS (ls.c:191) > ==1416== > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below From gdiso at ustc.edu Thu Dec 30 22:47:52 2010 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 31 Dec 2010 12:47:52 +0800 Subject: [petsc-users] pastix solver break at pastix_checkMatrix References: <54B6E01485354315BF5C9040319E4F0C@cogendaeda> <2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov> Message-ID: <76538ECE086E482FB1366C668A1EA28A@cogendaeda> Dear Barry Sorry, I should complement that when I set DIFFERENT_NONZERO_PATTERN to the MatStructure, pastix works well. But it crash for SAME_NONZERO_PATTERN. However, I always use SAME_NONZERO_PATTERN before, which works for ksp solvers, mumps, superlu_dist, etc. Yours Gong Ding From bsmith at mcs.anl.gov Fri Dec 31 12:12:29 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 31 Dec 2010 12:12:29 -0600 Subject: [petsc-users] pastix solver break at pastix_checkMatrix In-Reply-To: <76538ECE086E482FB1366C668A1EA28A@cogendaeda> References: <54B6E01485354315BF5C9040319E4F0C@cogendaeda> <2DF65225-7B22-43A8-A2D3-DE2DD4B75BF6@mcs.anl.gov> <76538ECE086E482FB1366C668A1EA28A@cogendaeda> Message-ID: Sorry. Yes there was another bug. What caused both of these problems is that Pastix requires a symmetric nonzero structure and the interface made some assumptions that the PETSc matrix had a symmetric nonzero structure which is usually true, hence it did not crash for most matrices people use. I've attached another copy of pastix.c follow the same procedure again. Barry On Dec 30, 2010, at 10:47 PM, Gong Ding wrote: > Dear Barry > Sorry, I should complement that > when I set DIFFERENT_NONZERO_PATTERN to the MatStructure, pastix works well. > But it crash for SAME_NONZERO_PATTERN. > However, I always use SAME_NONZERO_PATTERN before, which works for ksp solvers, mumps, superlu_dist, etc. > > Yours > Gong Ding > > -------------- next part -------------- A non-text attachment was scrubbed... Name: pastix.c Type: application/octet-stream Size: 28044 bytes Desc: not available URL: From pengxwang at hotmail.com Fri Dec 31 12:23:12 2010 From: pengxwang at hotmail.com (Peter Wang) Date: Fri, 31 Dec 2010 12:23:12 -0600 Subject: [petsc-users] the size of matrix in PETSc solver Message-ID: I am trying to build a big matrix to be solved by KSP. I am wondering is there any limitation to the matrix size? What is the maximum size of the matrix in KSP? For example, if a 100 M by 100 M matrix can be handled by the KSP? Thanks a lot. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 31 12:28:05 2010 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 31 Dec 2010 12:28:05 -0600 Subject: [petsc-users] the size of matrix in PETSc solver In-Reply-To: References: Message-ID: On Fri, Dec 31, 2010 at 12:23 PM, Peter Wang wrote: > I am trying to build a big matrix to be solved by KSP. I am wondering is > there any limitation to the matrix size? What is the maximum size of the > matrix in KSP? For example, if a 100 M by 100 M matrix can be handled by the > KSP? Thanks a lot. > There are two limitations: a) Machine memory. To combat this, we recommend running in parallel b) Representation of row/col numbers by integers. If you have > 2B rows, you will need to reconfigure using --with-64-bit-ints. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Dec 31 12:39:28 2010 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 31 Dec 2010 12:39:28 -0600 Subject: [petsc-users] the size of matrix in PETSc solver In-Reply-To: References: Message-ID: On Dec 31, 2010, at 12:28 PM, Matthew Knepley wrote: > On Fri, Dec 31, 2010 at 12:23 PM, Peter Wang wrote: > I am trying to build a big matrix to be solved by KSP. I am wondering is there any limitation to the matrix size? What is the maximum size of the matrix in KSP? For example, if a 100 M by 100 M matrix can be handled by the KSP? Thanks a lot. > > There are two limitations: > > a) Machine memory. To combat this, we recommend running in parallel > > b) Representation of row/col numbers by integers. If you have > 2B rows, you will need to reconfigure > using --with-64-bit-ints. --with-64-bit-indices > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener