From recrusader at gmail.com Thu May 1 11:33:51 2008 From: recrusader at gmail.com (Yujie) Date: Thu, 1 May 2008 09:33:51 -0700 Subject: about MatMatMult() Message-ID: <7ff0ee010805010933k40d2fbf6q72f5558da87cece1@mail.gmail.com> I have further checked this function. In MatMatMult(Mat A,Mat B,MatReuse scall,PetscReal fill,Mat *C) I am wondering why the type of C is MATAIJ when the types of A and B are MATAIJ. although A and B is MATAIJ, C should be dense. If C uses MATAIJ type, it should take more memory, is it right? thanks a lot. Regards, Yujie -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu May 1 11:55:28 2008 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Thu, 1 May 2008 11:55:28 -0500 (CDT) Subject: about MatMatMult() In-Reply-To: <7ff0ee010805010933k40d2fbf6q72f5558da87cece1@mail.gmail.com> References: <7ff0ee010805010933k40d2fbf6q72f5558da87cece1@mail.gmail.com> Message-ID: Yujie, In general, C=A*B is denser thant A and B. Thus, sparse matrix product should be avoided. Petsc sparse MatMatMult() is intended for supporting multigrid computation MatPtAP() in which, P is a projector and C=Pt*A*P maintains similar sparse density. If your C=A*B is dense, you may set A and B in dense format. In sequential case, petsc calls LAPACK for MatMatMult() which would be much efficient than sparse implementation. Hong On Thu, 1 May 2008, Yujie wrote: > I have further checked this function. > In MatMatMult(Mat A,Mat B,MatReuse scall,PetscReal fill,Mat *C) > I am wondering > why the type of C is MATAIJ when the types of A and B are MATAIJ. > although A and B is MATAIJ, C should be dense. If C uses MATAIJ type, > it should take more memory, is it right? > > thanks a lot. > > Regards, > Yujie > From recrusader at gmail.com Thu May 1 19:08:25 2008 From: recrusader at gmail.com (Yujie) Date: Thu, 1 May 2008 17:08:25 -0700 Subject: further about PCComputeExplicitOperator() Message-ID: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> when 1 processor is used, the matrix M in PCComputeExplicitOperator(pc,&M) uses MATSEQDENSE type. Now, I want to use MATSEQAIJ, I change the codes as follows: 1563 if (size == 1) { 1564 //05/01/08 1565 //ierr = MatSetType(*mat,MATSEQDENSE);CHKERRQ(ierr); 1566 //ierr = MatSeqDenseSetPreallocation(*mat,PETSC_NULL);CHKERRQ(ierr); 1567 ierr = MatSetType(*mat,MATSEQAIJ);CHKERRQ(ierr); 1568 ierr = MatSeqAIJSetPreallocation(*mat,0,PETSC_NULL);CHKERRQ(ierr); 1569 1570 } else { 1571 ierr = MatSetType(*mat,MATMPIAIJ);CHKERRQ(ierr); 1572 ierr = MatMPIAIJSetPreallocation(*mat,0,PETSC_NULL,0,PETSC_NULL);CHKERRQ(ierr); 1573 } PCApply is fast when running. However, MatSetValues() is very very slow when some arraies need to set. I find that the problem likely lies in MatSeqXAIJReallocateAIJ(A,A->rmap.n,1,nrow,row,col,rmax,aa,ai,aj,rp,ap,imax,nonew,MatScalar) in MatSetValues_SeqAIJ() after debugging the codes. I can't further figure out where is the problem. Because it is difficult to debug. Could you give me some advice? thanks a lot. the version of PETSc is 2.3.3-p8. thanks a lot. Regards, Yujie -------------- next part -------------- An HTML attachment was scrubbed... URL: From amjad11 at gmail.com Fri May 2 11:12:20 2008 From: amjad11 at gmail.com (amjad ali) Date: Fri, 2 May 2008 21:12:20 +0500 Subject: Selection between C2D and Xeon 3000 for PETSc Sparse solvers In-Reply-To: References: <428810f20804220643r618753dayb3cae42b9f92b7e7@mail.gmail.com> Message-ID: <428810f20805020912s6bf13502s89fc63fe6044556c@mail.gmail.com> Hello Dr. Satish, I am still bit confused in taking decesion and wanting some more guidence from you and PETSc users/maint. *Question ONE:* Please help me in selecting out one of the following two clusters, *bit updated/changed from my previous choices.* As I am going to make a gigabit ethernet cluster of 4 compute nodes (totaling 8 cores), with each node having: (Choice 1) One Processor: Intel Core2Duo E6750 2.66GHz, FSB 1333MHz, 4MB L2. Motherboard: Intel Desktop Board DX38BT with Intel X38 Chipset supporting 1333/1066/800 MHz system bus. RAM: 2GB DDR3 1333MHz ECC System Memory. (Choice 2) One Processor: Intel Xeon 3075 2.66GHz, FSB1333, 4MBL2. Motherboard: Intel Entry Server Board Intel S3200SHV with intel 3200 Chipset supporting 1333/1066/800 MHz system bus. RAM: 2GB DDR2 800MHz ECC System Memory. Which one system has larger memory-bandwidth/CPU-core? Better for PETSc? Any other comment/remark? My area work deals in sparse matrices. I near future I would like to add 12 similar compute nodes in the cluster. So my decision should be optimum/long-lasting. *Question TWO:* I want to make ROCKS V cluster on Intel x86_64 machines (selected from the above options). ROCKS V is available separately for both i386 and x86_64 machines. But I have not seen two different version of PETSc for i386 and x86_64 machines. Is there only a single version for PETSc for both i386 and x86_64 machines? IF YES, then out of "ROCKS/OS i386" and "ROCKS/OS x86_64" which one is more suitable (efficiecy/speed/performance wise) for PETSc? Regards, Amjad Ali. On Tue, Apr 22, 2008 at 8:25 PM, Satish Balay wrote: > On Tue, 22 Apr 2008, amjad ali wrote: > > > Hello, > > > > Please help me out in selecting any one choice of the following: > > (Currently I am making a gigabit ethernet cluster of 4 compute nodes > > (totaling 8 cores), with each node having) > > > > (Choice 1) > > One Processor: Intel Core2Duo E6750 2.66 GHz Processor, FSB 1333MHz, 4MB > L2. > > Motherboard: Intel Entry Server Board Intel S3200SHV with intel 3200 > Chipset > > supporting 1333/1066/800 MHz FSB . > > RAM: 2GB DDR2 800MHz ECC System Memory. > > > > (Choice 2) > > One Processor: Intel Xeon 3075 2.66 GHz FSB1333 4MBL2. > > Motherboard: Intel Entry Server Board Intel S3200SHV with intel 3200 > > Chipset supporting 1333/1066/800 MHz FSB . > > RAM: 2GB DDR2 800MHz ECC System Memory. > > > > Which one system has larger memory-bandwidth/CPU-core? > > Any other comment/remark? > > My area work deals in sparse matrices. > > I near future I would like to add 12 similar compute nodes in the > cluster. > > Based on the above numbers - the memory bandwidth numbers should be > the same. And I expect the performance to be the same in both cases. > > Ideally you would have access to both machines [perhaps from the > vendor] - and run streams benchmark on each - to see if there is any > difference. > > Satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 2 11:50:21 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 2 May 2008 12:50:21 -0400 Subject: further about PCComputeExplicitOperator() In-Reply-To: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> Message-ID: Application of a preconditioner is almost always a dense operator. The only exception is sparse approximate inverse and in fact this is why the sparse approximate inverse is a lousy preconditioner. So I don't think you would ever want to compute the preconditioner into a sparse matrix. If you want the SPAI preconditioner explicitly as a PETSc sparse matrix you should go into that code, figure out how the SPAI stores its computed preconditioner and convert it to the PETSc format. Note also: PCComputeExplicitOperator - Computes the explicit preconditioned operator. this means it computes B*A or A*B (depending on left or right preconditioning). This beasty (unless you use Jacobi preconditioning) is always dense and it makes no sense to store in sparse format except for fun. Barry On May 1, 2008, at 8:08 PM, Yujie wrote: > when 1 processor is used, the matrix M in > PCComputeExplicitOperator(pc,&M) uses MATSEQDENSE type. Now, I want > to use MATSEQAIJ, I change the codes as follows: > 1563 if (size == 1) { > 1564 //05/01/08 > 1565 //ierr = MatSetType(*mat,MATSEQDENSE);CHKERRQ(ierr); > 1566 //ierr = > MatSeqDenseSetPreallocation(*mat,PETSC_NULL);CHKERRQ(ierr); > 1567 ierr = MatSetType(*mat,MATSEQAIJ);CHKERRQ(ierr); > 1568 ierr = MatSeqAIJSetPreallocation(*mat, > 0,PETSC_NULL);CHKERRQ(ierr); > 1569 > 1570 } else { > 1571 ierr = MatSetType(*mat,MATMPIAIJ);CHKERRQ(ierr); > 1572 ierr = MatMPIAIJSetPreallocation(*mat,0,PETSC_NULL, > 0,PETSC_NULL);CHKERRQ(ierr); > 1573 } > > PCApply is fast when running. However, MatSetValues() is very very > slow when some arraies need to set. I find that the problem likely > lies in > MatSeqXAIJReallocateAIJ(A,A->rmap.n, > 1,nrow,row,col,rmax,aa,ai,aj,rp,ap,imax,nonew,MatScalar) in > MatSetValues_SeqAIJ() after debugging the codes. > I can't further figure out where is the problem. Because it is > difficult to debug. Could you give me some advice? thanks a lot. > > the version of PETSc is 2.3.3-p8. > > thanks a lot. > > Regards, > Yujie > > > > From dave.mayhem23 at gmail.com Fri May 2 18:19:00 2008 From: dave.mayhem23 at gmail.com (Dave May) Date: Sat, 3 May 2008 09:19:00 +1000 Subject: further about PCComputeExplicitOperator() In-Reply-To: References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> Message-ID: <956373f0805021619y239b5caeh9330477f2f9fcd7e@mail.gmail.com> Hi Barry, Does PCComputeExplicitOperator() really build B*A or A*B? The above code appears to just be applying the preconditioner to the each column of the identity matrix, with no reference to the original operator or the preconditioner side. I only wanted to clarify this fact as at one stage I wrote a function to compute B*A or A*B as I had convinced myself that PCComputeExplicitOperator() just assembled the inverse of the preconditioner. Cheers, Dave. On Sat, May 3, 2008 at 2:50 AM, Barry Smith wrote: > > Note also: > PCComputeExplicitOperator - Computes the explicit preconditioned operator. > this means it computes B*A or A*B (depending on left or right > preconditioning). This beasty > (unless you use Jacobi preconditioning) is always dense and it makes no > sense to store in > sparse format except for fun. > > Barry > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From recrusader at gmail.com Fri May 2 18:26:15 2008 From: recrusader at gmail.com (Yujie) Date: Fri, 2 May 2008 16:26:15 -0700 Subject: further about PCComputeExplicitOperator() In-Reply-To: References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> Message-ID: <7ff0ee010805021626r33afdd02x7db87d6c05a6b521@mail.gmail.com> Dear Barry: "Note also: PCComputeExplicitOperator - Computes the explicit preconditioned operator. this means it computes B*A or A*B (depending on left or right preconditioning). This beasty (unless you use Jacobi preconditioning) is always dense and it makes no sense to store in sparse format except for fun." Regarding the above comments, I have checked the array of the explicit preconditioner obtained in PCComputeExplicitOperator(). It is sparse. Why say it is dense? thanks a lot. Regards, Yujie On Fri, May 2, 2008 at 9:50 AM, Barry Smith wrote: > > Application of a preconditioner is almost always a dense operator. The > only exception is sparse > approximate inverse and in fact this is why the sparse approximate inverse > is a lousy preconditioner. > So I don't think you would ever want to compute the preconditioner into a > sparse matrix. > > If you want the SPAI preconditioner explicitly as a PETSc sparse matrix > you should go into that > code, figure out how the SPAI stores its computed preconditioner and > convert it to the PETSc > format. > > Note also: > PCComputeExplicitOperator - Computes the explicit preconditioned operator. > this means it computes B*A or A*B (depending on left or right > preconditioning). This beasty > (unless you use Jacobi preconditioning) is always dense and it makes no > sense to store in > sparse format except for fun. > > Barry > > > > On May 1, 2008, at 8:08 PM, Yujie wrote: > > when 1 processor is used, the matrix M in PCComputeExplicitOperator(pc,&M) > > uses MATSEQDENSE type. Now, I want to use MATSEQAIJ, I change the codes as > > follows: > > 1563 if (size == 1) { > > 1564 //05/01/08 > > 1565 //ierr = MatSetType(*mat,MATSEQDENSE);CHKERRQ(ierr); > > 1566 //ierr = > > MatSeqDenseSetPreallocation(*mat,PETSC_NULL);CHKERRQ(ierr); > > 1567 ierr = MatSetType(*mat,MATSEQAIJ);CHKERRQ(ierr); > > 1568 ierr = MatSeqAIJSetPreallocation(*mat,0,PETSC_NULL);CHKERRQ(ierr); > > 1569 > > 1570 } else { > > 1571 ierr = MatSetType(*mat,MATMPIAIJ);CHKERRQ(ierr); > > 1572 ierr = > > MatMPIAIJSetPreallocation(*mat,0,PETSC_NULL,0,PETSC_NULL);CHKERRQ(ierr); > > 1573 } > > > > PCApply is fast when running. However, MatSetValues() is very very slow > > when some arraies need to set. I find that the problem likely lies in > > MatSeqXAIJReallocateAIJ(A,A->rmap.n,1,nrow,row,col,rmax,aa,ai,aj,rp,ap,imax,nonew,MatScalar) > > in MatSetValues_SeqAIJ() after debugging the codes. > > I can't further figure out where is the problem. Because it is difficult > > to debug. Could you give me some advice? thanks a lot. > > > > the version of PETSc is 2.3.3-p8. > > > > thanks a lot. > > > > Regards, > > Yujie > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat May 3 14:42:59 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 3 May 2008 15:42:59 -0400 Subject: further about PCComputeExplicitOperator() In-Reply-To: <956373f0805021619y239b5caeh9330477f2f9fcd7e@mail.gmail.com> References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> <956373f0805021619y239b5caeh9330477f2f9fcd7e@mail.gmail.com> Message-ID: <7E0A3675-0B00-49D3-B50F-7E64E07F5144@mcs.anl.gov> On May 2, 2008, at 7:19 PM, Dave May wrote: > Hi Barry, > Does PCComputeExplicitOperator() really build B*A or A*B? > The above code appears to just be applying the preconditioner to the > each column of the identity matrix, with no reference to the > original operator or the preconditioner side. Dave, You are correct; I apologize. There is a KSPComputeExplicitOperator() that computes the preconditioned operator (B*A or A*B) depends on the KSP preconditioner sider. The manual page for PCComputeExplicitOperator() does not mention KSPComputeExplicitOperator(), I have rectified this. Barry > > > I only wanted to clarify this fact as at one stage I wrote a > function to compute B*A or A*B as I had convinced myself that > PCComputeExplicitOperator() just assembled the inverse of the > preconditioner. > > Cheers, > Dave. > > > > On Sat, May 3, 2008 at 2:50 AM, Barry Smith > wrote: > > Note also: > PCComputeExplicitOperator - Computes the explicit preconditioned > operator. > this means it computes B*A or A*B (depending on left or right > preconditioning). This beasty > (unless you use Jacobi preconditioning) is always dense and it makes > no sense to store in > sparse format except for fun. > > Barry > From keita at cray.com Mon May 5 12:20:25 2008 From: keita at cray.com (Keita Teranishi) Date: Mon, 5 May 2008 12:20:25 -0500 Subject: C++ and Fortran Support Message-ID: <925346A443D4E340BEB20248BAFCDBDF05516D4A@CFEVS1-IP.americas.cray.com> Hi, Just a quick question. Can PETSc support both C++ and Fortran interface together with a single bmake? Thanks, ================================ Keita Teranishi Math Software Group Cray, Inc. keita at cray.com ================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon May 5 12:50:29 2008 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 5 May 2008 12:50:29 -0500 (CDT) Subject: C++ and Fortran Support In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF05516D4A@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF05516D4A@CFEVS1-IP.americas.cray.com> Message-ID: On Mon, 5 May 2008, Keita Teranishi wrote: > Hi, > > > > Just a quick question. Can PETSc support both C++ and Fortran interface together with a single bmake? yes. the fortran interface will always be built [as long as '--with-fc=' is specified] - irrespective of -with-clanguage=c [or cxx] Satish From jagruti.trivedi at navy.mil Mon May 5 13:10:51 2008 From: jagruti.trivedi at navy.mil (Trivedi, Jagruti CIV 470000D, 474300D) Date: Mon, 5 May 2008 11:10:51 -0700 Subject: Unsubscribe me Message-ID: Unsubscribe me from email alias -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 5 13:14:30 2008 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 5 May 2008 13:14:30 -0500 Subject: Unsubscribe me In-Reply-To: References: Message-ID: On Mon, May 5, 2008 at 1:10 PM, Trivedi, Jagruti CIV 470000D, 474300D wrote: > > Unsubscribe me from email alias You can actually unsubscribe yourself from petsc-users by sending a mail to majordomo. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From stephane.aubert at fluorem.com Tue May 6 12:20:07 2008 From: stephane.aubert at fluorem.com (Stephane Aubert) Date: Tue, 06 May 2008 19:20:07 +0200 Subject: How to implement a block version of MatDiagonalScale? Message-ID: <48209347.1060500@fluorem.com> Hello, I'm "playing" with badly conditioned block matrices (MPIBAIJ type) arising from turbulent compressible fluid dynamics (RANS eqs, block size = 5 (laminar) or 7 (turbulent)). I tested that using bi-normalization approach to build l and r vectors, then calling MatDiagonalScale(A,l,r), improve the accuracy and the convergence of GMRES+ILU(0), basically by reducing the dependence to the mesh cells size and to the variables magnitudes (in particular for k-w turbulence model). Now, I would like to go further and use L and R block diagonal matrices, instead of vectors, for example to get block identity along the diagonal of L.A.R. The sparcity of A is preserved, from a block point of view. My first idea is to use a LU factorization of Aii, with L=Li^(-1), R=Ui^(-1), Aii=Li.Ui. My question is: How to compute L and R using PETSC available functions in a clever way, instead of calling some LAPACK functions after getting individual diagonal blocks in my own piece of code? My actual guess is: 1. Create a new matrix D with MPIBDIAG type, from the block-diagonal of A using MatGetSubMatrices(), to do something like MatGetDiagonal(). 2. Create a PC of type PCLU using D as operators, with the sequence PCCreate/PCSetType/PCSetOperators/PCSetUp. 3. Get Li and Ui. This is where I'm glued: I can't figure out how to use PCGetFactoredMatrix() to get Li and Ui... Is MatLUFactor() a better candidate? 4. Do I need to compute Li^(-1) and Ui^(-1), or is there some "backsubstitution" functions available? With many thanks for your suggestions, Stef. -- ___________________________________________________________ Dr. Stephane AUBERT, CEO & CTO FLUOREM s.a.s Centre Scientifique Auguste MOIROUX 64 chemin des MOUILLES F-69130 ECULLY, FRANCE International: fax: +33 4.78.33.99.39 tel: +33 4.78.33.99.35 France: fax: 04.78.33.99.39 tel: 04.78.33.99.35 email: stephane.aubert at fluorem.com web: www.fluorem.com From Amit.Itagi at seagate.com Tue May 6 12:53:10 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Tue, 6 May 2008 13:53:10 -0400 Subject: DA question In-Reply-To: <1D003C34-5E65-4340-98EC-8274AA32BA16@mcs.anl.gov> Message-ID: > > One question : How compatible is PetSc with Blitz++ ? Can I declare > > the > > array to be returned by DAVecGetArray to be a Blitz array ? > > Likely you would need to use VecGetArray() and then somehow build > the Blitz > array using the pointer returned and the sizes of the local part of > the DA. > > If you figure out how to do this then maybe we could have a > DAVecGetArrayBlitz() > > Barry > Barry, I did some more thinking about this. If I have a standard C array (any dimension) that is stored in a contiguous block of memory (with regular ordering), there is a Blitz constructor that can convert it to a Blitz array. I took a look at the source of DAVecGetArray. The array generation depends (in the VecGetArray3d source) on VecGetArray, and a short code that allocates storage to store the pointers and to do the pointer assignments to appropriate parts of VecGetArray. Looking at the assignments, it looks like the 3D array returned by DAVecGetArray has the contiguous, regularly ordered storage format that Blitz expects. Is this correct ? Thanks Rgds, Amit From knepley at gmail.com Tue May 6 14:13:13 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 6 May 2008 14:13:13 -0500 Subject: DA question In-Reply-To: References: <1D003C34-5E65-4340-98EC-8274AA32BA16@mcs.anl.gov> Message-ID: On Tue, May 6, 2008 at 12:53 PM, wrote: > > > One question : How compatible is PetSc with Blitz++ ? Can I declare > > > the > > > array to be returned by DAVecGetArray to be a Blitz array ? > > > > Likely you would need to use VecGetArray() and then somehow build > > the Blitz > > array using the pointer returned and the sizes of the local part of > > the DA. > > > > If you figure out how to do this then maybe we could have a > > DAVecGetArrayBlitz() > > > > Barry > > > > > Barry, > > I did some more thinking about this. If I have a standard C array (any > dimension) that is stored in a contiguous block of memory (with regular > ordering), there is a Blitz constructor that can convert it to a Blitz > array. > > I took a look at the source of DAVecGetArray. The array generation depends > (in the VecGetArray3d source) on VecGetArray, and a short code that > allocates storage to store the pointers and to do the pointer assignments > to appropriate parts of VecGetArray. Looking at the assignments, it looks > like the 3D array returned by DAVecGetArray has the contiguous, regularly > ordered storage format that Blitz expects. Is this correct ? All DA storage is just contiguous blocks of memory, like a PETSc Vec. Matt > Thanks > > Rgds, > Amit -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From bsmith at mcs.anl.gov Tue May 6 15:47:37 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 6 May 2008 15:47:37 -0500 Subject: DA question In-Reply-To: References: Message-ID: <8C9938D0-AE7F-4EE2-874E-E42285797634@mcs.anl.gov> On May 6, 2008, at 12:53 PM, Amit.Itagi at seagate.com wrote: > > > >>> One question : How compatible is PetSc with Blitz++ ? Can I declare >>> the >>> array to be returned by DAVecGetArray to be a Blitz array ? >> >> Likely you would need to use VecGetArray() and then somehow build >> the Blitz >> array using the pointer returned and the sizes of the local part of >> the DA. >> >> If you figure out how to do this then maybe we could have a >> DAVecGetArrayBlitz() >> >> Barry >> > > > Barry, > > I did some more thinking about this. If I have a standard C array (any > dimension) that is stored in a contiguous block of memory (with > regular > ordering), there is a Blitz constructor that can convert it to a Blitz > array. > > I took a look at the source of DAVecGetArray. The array generation > depends > (in the VecGetArray3d source) on VecGetArray, and a short code that > allocates storage to store the pointers and to do the pointer > assignments > to appropriate parts of VecGetArray. Looking at the assignments, it > looks > like the 3D array returned by DAVecGetArray has the contiguous, > regularly > ordered storage format that Blitz expects. Is this correct ? The value returned by VecGetArray() returns a simple contiguous, regularly > > ordered storage format that Blitz expects. I highly recommend you > simply call the VecGetArray() and then the Blitz constructor; there is absolutely no reason to use the 3d array returned by DAVecGetArray() for this. Barry > > > Thanks > > Rgds, > Amit > From bsmith at mcs.anl.gov Tue May 6 15:57:35 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 6 May 2008 15:57:35 -0500 Subject: How to implement a block version of MatDiagonalScale? In-Reply-To: <48209347.1060500@fluorem.com> References: <48209347.1060500@fluorem.com> Message-ID: <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov> Stef, You do not want to do it either way you have outlined below. Since the blocks are of size 5 or 7 you DO NOT EVER IN A BILLION YEARS WANT to use LAPACK/BLAS to do the little factorizations and solves. The overhead of the LAPACK/BLAS will kill performance for that size array. Similarly using the PETSc Mat objects and PC objects are not suitable for those tiny matrices. Here is what I would do. Take a look at the function MatLUFactorNumeric_SeqBAIJ_5() in the file src/mat/impls/baij/seq/ baijfact9.c Note it uses Kernel_A_gets_inverse_A_5() to invert the little 5 by 5 blocks. Also it uses inlined code to do the little 5 by 5 matrix matrix multiplies. (Yes for this size problem you do want to invert the little 5 by 5 matrices and do matrix matrix products instead of only doing 5 by 5 LU factorizations and lots of triangular solves; it is much faster this way.) I think by looking at this subroutine you can figure out how to loop over the nonzero blocks of A and multiply by the appropriate diagonal blocks you have inverted to obtain the new block diagonal scaled A matrix. Barry We would like to include the resulting code in PETSc if you would like to donate it. Thanks On May 6, 2008, at 12:20 PM, Stephane Aubert wrote: > Hello, > I'm "playing" with badly conditioned block matrices (MPIBAIJ type) > arising from turbulent compressible fluid dynamics (RANS eqs, block > size = 5 (laminar) or 7 (turbulent)). > I tested that using bi-normalization approach to build l and r > vectors, then calling MatDiagonalScale(A,l,r), improve the accuracy > and the convergence of GMRES+ILU(0), basically by reducing the > dependence to the mesh cells size and to the variables magnitudes > (in particular for k-w turbulence model). > Now, I would like to go further and use L and R block diagonal > matrices, instead of vectors, for example to get block identity > along the diagonal of L.A.R. The sparcity of A is preserved, from a > block point of view. > My first idea is to use a LU factorization of Aii, with L=Li^(-1), > R=Ui^(-1), Aii=Li.Ui. > > My question is: How to compute L and R using PETSC available > functions in a clever way, instead of calling some LAPACK functions > after getting individual diagonal blocks in my own piece of code? > > My actual guess is: > > 1. Create a new matrix D with MPIBDIAG type, from the block-diagonal > of A using MatGetSubMatrices(), to do something like > MatGetDiagonal(). > 2. Create a PC of type PCLU using D as operators, with the sequence > PCCreate/PCSetType/PCSetOperators/PCSetUp. > 3. Get Li and Ui. This is where I'm glued: I can't figure out how to > use PCGetFactoredMatrix() to get Li and Ui... Is MatLUFactor() a > better candidate? > 4. Do I need to compute Li^(-1) and Ui^(-1), or is there some > "backsubstitution" functions available? > > With many thanks for your suggestions, > Stef. > > > -- > ___________________________________________________________ > Dr. Stephane AUBERT, CEO & CTO > FLUOREM s.a.s > Centre Scientifique Auguste MOIROUX > 64 chemin des MOUILLES > F-69130 ECULLY, FRANCE > International: fax: +33 4.78.33.99.39 tel: +33 4.78.33.99.35 > France: fax: 04.78.33.99.39 tel: 04.78.33.99.35 > email: stephane.aubert at fluorem.com > web: www.fluorem.com > > From Amit.Itagi at seagate.com Tue May 6 16:04:03 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Tue, 6 May 2008 17:04:03 -0400 Subject: DA question In-Reply-To: <8C9938D0-AE7F-4EE2-874E-E42285797634@mcs.anl.gov> Message-ID: Barry, You are right. The VecGetArray followed by the Blitz constructor works fine. Thanks Rgds, Amit owner-petsc-users at mcs.anl.gov wrote on 05/06/2008 04:47:37 PM: > > On May 6, 2008, at 12:53 PM, Amit.Itagi at seagate.com wrote: > > > > > > > > >>> One question : How compatible is PetSc with Blitz++ ? Can I declare > >>> the > >>> array to be returned by DAVecGetArray to be a Blitz array ? > >> > >> Likely you would need to use VecGetArray() and then somehow build > >> the Blitz > >> array using the pointer returned and the sizes of the local part of > >> the DA. > >> > >> If you figure out how to do this then maybe we could have a > >> DAVecGetArrayBlitz() > >> > >> Barry > >> > > > > > > Barry, > > > > I did some more thinking about this. If I have a standard C array (any > > dimension) that is stored in a contiguous block of memory (with > > regular > > ordering), there is a Blitz constructor that can convert it to a Blitz > > array. > > > > I took a look at the source of DAVecGetArray. The array generation > > depends > > (in the VecGetArray3d source) on VecGetArray, and a short code that > > allocates storage to store the pointers and to do the pointer > > assignments > > to appropriate parts of VecGetArray. Looking at the assignments, it > > looks > > like the 3D array returned by DAVecGetArray has the contiguous, > > regularly > > ordered storage format that Blitz expects. Is this correct ? > > The value returned by VecGetArray() returns a simple contiguous, > regularly > > > > ordered storage format that Blitz expects. I highly recommend you > > simply > call the VecGetArray() and then the Blitz constructor; there is > absolutely no reason > to use the 3d array returned by DAVecGetArray() for this. > > Barry > > > > > > > Thanks > > > > Rgds, > > Amit > > > From matthew.gross1 at navy.mil Wed May 7 14:02:26 2008 From: matthew.gross1 at navy.mil (Gross, Matthew CIV NAVAIR, 474200D) Date: Wed, 7 May 2008 15:02:26 -0400 Subject: Unsubscribe me In-Reply-To: References: Message-ID: Unsubscribe me from email alias From neckel at in.tum.de Thu May 8 09:07:32 2008 From: neckel at in.tum.de (Tobias Neckel) Date: Thu, 08 May 2008 16:07:32 +0200 Subject: Question on matrix preallocation Message-ID: <48230924.7040600@in.tum.de> Hello, when using petsc (version 2.3.2 on a linux 32bit Intel architecture) to set up a serial sparse linear system of equations, I recently noticed the well-known allocation performance problem: The matrix setup needs more memory than preallocated with a fixed number of column entries for all rows. Thus, I switched to the strategy described in the Users Manual (first counting the number of matrix entries for each row individually and then using the nnz parameter in MatCreateSeqAIJ()). But this did not change the dynamic allocation behaviour at all. Therefore, I tried to bring everything down to a (very) small test example. I set up a nnz-1D-array of type int and length 4 which holds the number of expected non-zero column entries for each row of a matrix (in particular 2 columns in row 0). Using this nnz-array, I create a 4x4 matrix. Afterwards, I set the entry (0,0) of the matrix to a non-zero value. The source code part for this simple test can be found in the attached file testMatPreallocation.cpp. When I run this test (with the additional -info runtime option), the one and only matrix entry setting results in an additional memory allocation (see attached file commandLineOutput.txt)! This is quite surprising, as I would have expected enough preallocated memory for the matrix, which is also visible from the output. Am I misusing or missing something necessary to make the preallocation work? Thanks in advance for any hints, best regards Tobias Neckel -- Dipl.-Tech. Math. Tobias Neckel Institut f?r Informatik V, TU M?nchen Boltzmannstr. 3, 85748 Garching Tel.: 089/289-18602 Email: neckel at in.tum.de URL: http://www5.in.tum.de/persons/neckel.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: commandLineOutput.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: testMatPreallocation.cpp Type: text/x-c++src Size: 1693 bytes Desc: not available URL: From knepley at gmail.com Thu May 8 09:43:19 2008 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 8 May 2008 09:43:19 -0500 Subject: Question on matrix preallocation In-Reply-To: <48230924.7040600@in.tum.de> References: <48230924.7040600@in.tum.de> Message-ID: You are assembling before inserting values. This wipes out the preallocation information since assembly shrinks the matrix to an optimal size. Matt On Thu, May 8, 2008 at 9:07 AM, Tobias Neckel wrote: > Hello, > > when using petsc (version 2.3.2 on a linux 32bit Intel architecture) to set > up a serial sparse linear system of equations, I recently noticed the > well-known allocation performance problem: The matrix setup needs more > memory than preallocated with a fixed number of column entries for all rows. > Thus, I switched to the strategy described in the Users Manual (first > counting the number of matrix entries for each row individually and then > using the nnz parameter in MatCreateSeqAIJ()). But this did not change the > dynamic allocation behaviour at all. > > Therefore, I tried to bring everything down to a (very) small test example. > I set up a nnz-1D-array of type int and length 4 which holds the number of > expected non-zero column entries for each row of a matrix (in particular 2 > columns in row 0). Using this nnz-array, I create a 4x4 matrix. Afterwards, > I set the entry (0,0) of the matrix to a non-zero value. > The source code part for this simple test can be found in the attached file > testMatPreallocation.cpp. > > When I run this test (with the additional -info runtime option), the one > and only matrix entry setting results in an additional memory allocation > (see attached file commandLineOutput.txt)! > > This is quite surprising, as I would have expected enough preallocated > memory for the matrix, which is also visible from the output. Am I misusing > or missing something necessary to make the preallocation work? > > Thanks in advance for any hints, > best regards > Tobias Neckel > > -- > Dipl.-Tech. Math. Tobias Neckel > > Institut f?r Informatik V, TU M?nchen > Boltzmannstr. 3, 85748 Garching > > Tel.: 089/289-18602 > Email: neckel at in.tum.de > URL: http://www5.in.tum.de/persons/neckel.html > > 14:33:53 debug petsc::PETScLibTest::testMatPreallocation() > start PETSc mat preallocation test > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4 X 4; storage space: 10 > unneeded,0 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [0] Mat_CheckInode(): Found 1 nodes of 4. Limit used: 5. Using Inode > routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4 X 4; storage space: 14 > unneeded,1 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 1 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > [0] Mat_CheckInode(): Found 2 nodes of 4. Limit used: 5. Using Inode > routines > 14:33:53 debug petsc::PETScLibTest::testMatPreallocation() > stop PETSc mat preallocation test > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From w_subber at yahoo.com Thu May 8 13:58:03 2008 From: w_subber at yahoo.com (Waad Subber) Date: Thu, 8 May 2008 11:58:03 -0700 (PDT) Subject: Cannot use PETSC_DEFAULT in MatMatMult Message-ID: <132575.84711.qm@web38205.mail.mud.yahoo.com> Hi, I am trying to multiply to spares matrices. I am using PETSC_DEFAULT for the fill ratio; however, I cannot compile the code I get the following error message : Error: master.F, line 55: This name does not have a type, and must have an explicit type. [PETSC_DEFAULT] call MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,C,ierr) ----------------------------------------------------------^ compilation aborted for master.F (code 1) Thanks Waad --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 8 14:15:19 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 8 May 2008 14:15:19 -0500 Subject: Cannot use PETSC_DEFAULT in MatMatMult In-Reply-To: <132575.84711.qm@web38205.mail.mud.yahoo.com> References: <132575.84711.qm@web38205.mail.mud.yahoo.com> Message-ID: <4388AB93-EABA-434E-B5EE-5A6682804D46@mcs.anl.gov> In Fortran you must use PETSC_DEFAULT_DOUBLE_PRECISION. I'll make sure this is may clear in the docs Barry There is also a PETSC_DEFAULT_INTEGER for Fortran On May 8, 2008, at 1:58 PM, Waad Subber wrote: > Hi, > > I am trying to multiply to spares matrices. I am using PETSC_DEFAULT > for the fill ratio; however, I cannot compile the code I get the > following error message : > > Error: master.F, line 55: This name does not have a type, and must > have an explicit type. [PETSC_DEFAULT] > call MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,C,ierr) > ----------------------------------------------------------^ > compilation aborted for master.F (code 1) > > Thanks > > Waad > > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > Try it now. From griffith at cims.nyu.edu Thu May 8 18:00:07 2008 From: griffith at cims.nyu.edu (Boyce Griffith) Date: Thu, 08 May 2008 19:00:07 -0400 Subject: VecMultiVec? Message-ID: <482385F7.3040902@cims.nyu.edu> Hi, Folks -- I'm pretty sure this has been discussed in the list previously, but I'm having trouble digging up the thread in the archive, so apologies in advance... I need to solve some nonlinear equations that involve both PETSc Vecs as well as data that is stored in a non-PETSc-native format, and I was wondering if someone happens to have a freely available implementation of a "VecMultiVec" --- a vector which contains multiple Vec objects. I think it should be fairly straightforward to implement such a beast, but I thought I'd ask before doing it myself. Thanks, -- Boyce From zonexo at gmail.com Fri May 9 07:33:21 2008 From: zonexo at gmail.com (Ben Tay) Date: Fri, 09 May 2008 20:33:21 +0800 Subject: How to efficiently change just the diagonal vector in a matrix at every time step Message-ID: <48244491.2030303@gmail.com> Hi, I have a matrix and I inserted all the relevant values during the 1st step. I'll then solve it. For the subsequent steps, I only need to change the diagonal vector of the matrix before solving. I wonder how I can do it efficiently. Of course, the RHS vector also change but I've not included them here. I set these at the 1st step: call KSPSetOperators(ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr) call KSPGetPC(ksp_semi_x,pc_semi_x,ierr) ksptype=KSPRICHARDSON call KSPSetType(ksp_semi_x,ksptype,ierr) ptype = PCILU call PCSetType(pc_semi_x,ptype,ierr) call KSPSetFromOptions(ksp_semi_x,ierr) call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr) tol=1.e-5 call KSPSetTolerances(ksp_semi_x,tol,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr) and what I did at the subsequent steps is: do II=1,total call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr) end do call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr) I realise that the answers are slightly different as compared to calling all the options such as KSPSetType, KSPSetFromOptions, KSPSetTolerances at every time step. Should that be so? Is this the best way? Also, I can let the matrix be equal at every time step by fixing the delta_time. However, it may give stability problems. I wonder how expensive is these type of value changing and assembly for a matrix? Thank you very much. Regards. From sdettrick at gmail.com Fri May 9 07:50:30 2008 From: sdettrick at gmail.com (Sean Dettrick) Date: Fri, 9 May 2008 14:50:30 +0200 Subject: How to efficiently change just the diagonal vector in a matrix at every time step In-Reply-To: <48244491.2030303@gmail.com> References: <48244491.2030303@gmail.com> Message-ID: <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com> One way to do it is to have two Mats, A and B, and a Vec, D, to store the diagonal. A is constructed only on the first step. On subsequent steps, A is copied into B, and then D is added to the diagonal: ierr = MatCopy( A, B, SAME_NON_ZERO_PATTERN ); ierr = MatDiagonalSet( B, D, ADD_VALUES ); The KSP uses B as the matrix, not A. I don't know if this approach is efficient or not. Can anybody comment? Thanks, Sean On May 9, 2008, at 2:33 PM, Ben Tay wrote: > Hi, > I have a matrix and I inserted all the relevant values during the > 1st step. I'll then solve it. For the subsequent steps, I only need > to change the diagonal vector of the matrix before solving. I wonder > how I can do it efficiently. Of course, the RHS vector also change > but I've not included them here. > > I set these at the 1st step: > > call > KSPSetOperators > (ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr) > > call KSPGetPC(ksp_semi_x,pc_semi_x,ierr) > > ksptype=KSPRICHARDSON > > call KSPSetType(ksp_semi_x,ksptype,ierr) > > ptype = PCILU > > call PCSetType(pc_semi_x,ptype,ierr) > > call KSPSetFromOptions(ksp_semi_x,ierr) > > call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr) > > tol=1.e-5 > > call > KSPSetTolerances > (ksp_semi_x > ,tol > ,PETSC_DEFAULT_DOUBLE_PRECISION > ,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr) > > and what I did at the subsequent steps is: > > do II=1,total > call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr) > > end do > > call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) > > call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr) > > I realise that the answers are slightly different as compared to > calling all the options such as KSPSetType, KSPSetFromOptions, > KSPSetTolerances at every time step. Should that be so? Is this the > best way? > > Also, I can let the matrix be equal at every time step by fixing the > delta_time. However, it may give stability problems. I wonder how > expensive is these type of value changing and assembly for a matrix? > > Thank you very much. > > Regards. > From bsmith at mcs.anl.gov Fri May 9 10:48:22 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 9 May 2008 10:48:22 -0500 Subject: How to efficiently change just the diagonal vector in a matrix at every time step In-Reply-To: <48244491.2030303@gmail.com> References: <48244491.2030303@gmail.com> Message-ID: <1FCC7926-A348-4191-8683-DF3E23BB15B3@mcs.anl.gov> > > and what I did at the subsequent steps is: > > do II=1,total > call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr) > > end do > > call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) You can/should call KSPSetOperators() here EACH time, otherwise the KSPSolve does not know it has a new matrix and will not build a new preconditioner. Hence it continues to use the original preconditioner. This is why "answers are slightly different". If the original preconditioner works ok for all timesteps then you do not need to call the KSPSetOperators() Barry > > > call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr) > > I realise that the answers are slightly different as compared to > calling all the options such as KSPSetType, KSPSetFromOptions, > KSPSetTolerances at every time step. Should that be so? Is this the > best way? > > Also, I can let the matrix be equal at every time step by fixing the > delta_time. However, it may give stability problems. I wonder how > expensive is these type of value changing and assembly for a matrix? > > Thank you very much. > > Regards. > From Amit.Itagi at seagate.com Fri May 9 12:46:59 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Fri, 9 May 2008 13:46:59 -0400 Subject: Code structuring - Communicator Message-ID: Hi, I have a question about the Petsc communicator. I have a petsc program "foo" which essentially runs in parallel and gives me y=f(x1,x2,...), where y is an output parameter and xi's are input parameters. Suppose, I want to run a parallel optimizer for the input parameters. I am looking at the following functionality. I submit the optimizer job on 16 processors (using "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of "foo", each running parallely on 4 processors. "foo" will be written as a function and not as a main program in this case. How can I get this functionality using Petsc ? Should PetscInitialize be called in the optimizer, or in each foo run ? If PetscInitialize is called in the optimizer, is there a way to make the foo function run only on a subset of the 16 processors ? May be, I haven't done a good job of explaining my problem. Let me know if you need any clarifications. Thanks Rgds, Amit From bsmith at mcs.anl.gov Fri May 9 14:07:10 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 9 May 2008 14:07:10 -0500 Subject: Code structuring - Communicator In-Reply-To: References: Message-ID: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov> There are many ways to do this, most of them involve using MPI to construct subcommunicators for the various sub parallel tasks. You very likely want to keep PetscInitialize() at the very beginning of the program; you would not write the calls in terms of PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the subcommunicators to create the objects. An alternative approach is to look at the manual page for PetscOpenMPMerge(), PetscOpenMPRun(), PetscOpenMPNew() in petsc-dev. These allow a simple master-worker model of parallelism with PETSc with a bunch of masters that can work together (instead of just one master) and each master controls a bunch of workers. The code in src/ksp/pc/impls/ openmp uses this code. Note that OpenMP has NOTHING to do with OpenMP the standard. Also I don't really have any support for Fortran, I hope you use C/C++. Comments welcome. It sounds like this matches what you need. It's pretty cool, but underdeveloped. Barry On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote: > > Hi, > > I have a question about the Petsc communicator. I have a petsc program > "foo" which essentially runs in parallel and gives me > y=f(x1,x2,...), where > y is an output parameter and xi's are input parameters. Suppose, I > want to > run a parallel optimizer for the input parameters. I am looking at the > following functionality. I submit the optimizer job on 16 processors > (using > "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of > "foo", each running parallely on 4 processors. "foo" will be written > as a > function and not as a main program in this case. How can I get this > functionality using Petsc ? Should PetscInitialize be called in the > optimizer, or in each foo run ? If PetscInitialize is called in the > optimizer, is there a way to make the foo function run only on a > subset of > the 16 processors ? > > May be, I haven't done a good job of explaining my problem. Let me > know if > you need any clarifications. > > Thanks > > Rgds, > Amit > From zonexo at gmail.com Sat May 10 03:38:30 2008 From: zonexo at gmail.com (Ben Tay) Date: Sat, 10 May 2008 16:38:30 +0800 Subject: How to efficiently change just the diagonal vector in a matrix at every time step In-Reply-To: <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com> References: <48244491.2030303@gmail.com> <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com> Message-ID: <804ab5d40805100138t748676d7r3a074658ea73188a@mail.gmail.com> Hi Sean, Maybe for me, I can just insert vector diagonal D into the matrix A and call Assembly and KSP at every time step. Should that be better since there is no need to copy A into B? Thanks! On Fri, May 9, 2008 at 8:50 PM, Sean Dettrick wrote: > > One way to do it is to have two Mats, A and B, and a Vec, D, to store the > diagonal. A is constructed only on the first step. On subsequent steps, A > is copied into B, and then D is added to the diagonal: > > ierr = MatCopy( A, B, SAME_NON_ZERO_PATTERN ); > ierr = MatDiagonalSet( B, D, ADD_VALUES ); > > The KSP uses B as the matrix, not A. > > I don't know if this approach is efficient or not. Can anybody comment? > > Thanks, > Sean > > > > > On May 9, 2008, at 2:33 PM, Ben Tay wrote: > > Hi, >> I have a matrix and I inserted all the relevant values during the 1st >> step. I'll then solve it. For the subsequent steps, I only need to change >> the diagonal vector of the matrix before solving. I wonder how I can do it >> efficiently. Of course, the RHS vector also change but I've not included >> them here. >> >> I set these at the 1st step: >> >> call >> KSPSetOperators(ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr) >> >> call KSPGetPC(ksp_semi_x,pc_semi_x,ierr) >> >> ksptype=KSPRICHARDSON >> >> call KSPSetType(ksp_semi_x,ksptype,ierr) >> >> ptype = PCILU >> >> call PCSetType(pc_semi_x,ptype,ierr) >> >> call KSPSetFromOptions(ksp_semi_x,ierr) >> >> call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr) >> >> tol=1.e-5 >> >> call >> KSPSetTolerances(ksp_semi_x,tol,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr) >> >> and what I did at the subsequent steps is: >> >> do II=1,total >> call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr) >> >> end do >> >> call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) >> >> call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) >> >> call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr) >> >> I realise that the answers are slightly different as compared to calling >> all the options such as KSPSetType, KSPSetFromOptions, KSPSetTolerances at >> every time step. Should that be so? Is this the best way? >> >> Also, I can let the matrix be equal at every time step by fixing the >> delta_time. However, it may give stability problems. I wonder how expensive >> is these type of value changing and assembly for a matrix? >> >> Thank you very much. >> >> Regards. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Sat May 10 16:17:49 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 10 May 2008 18:17:49 -0300 Subject: VecMultiVec? In-Reply-To: <482385F7.3040902@cims.nyu.edu> References: <482385F7.3040902@cims.nyu.edu> Message-ID: Depending on what you need to actually do with your multiple vectors, PETSc do have a 'Vecs' (note de final 's') wich is just an array with Vec items... However, as you are working with non-native data, perhaps this will not be useful.. Take a look... On 5/8/08, Boyce Griffith wrote: > Hi, Folks -- > > I'm pretty sure this has been discussed in the list previously, but I'm > having trouble digging up the thread in the archive, so apologies in > advance... > > I need to solve some nonlinear equations that involve both PETSc Vecs as > well as data that is stored in a non-PETSc-native format, and I was > wondering if someone happens to have a freely available implementation of a > "VecMultiVec" --- a vector which contains multiple Vec objects. I think it > should be fairly straightforward to implement such a beast, but I thought > I'd ask before doing it myself. > > Thanks, > > -- Boyce > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From sdettrick at gmail.com Mon May 12 05:42:47 2008 From: sdettrick at gmail.com (Sean Dettrick) Date: Mon, 12 May 2008 12:42:47 +0200 Subject: How to efficiently change just the diagonal vector in a matrix at every time step In-Reply-To: <804ab5d40805100138t748676d7r3a074658ea73188a@mail.gmail.com> References: <48244491.2030303@gmail.com> <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com> <804ab5d40805100138t748676d7r3a074658ea73188a@mail.gmail.com> Message-ID: <210E7E60-4FEB-41AD-9DE9-C9ACBB6AFB05@gmail.com> On May 10, 2008, at 10:38 AM, Ben Tay wrote: > Hi Sean, > > Maybe for me, I can just insert vector diagonal D into the matrix A > and call Assembly and KSP at every time step. Should that be better > since there is no need to copy A into B? > > Thanks! Yes, that sounds like it would be faster. I suppose you would use INSERT_VALUES rather than ADD_VALUES. In my case I copy the whole matrix because constructing the time- constant part of the diagonal is very complicated. But now that you mention it, I could store both the time-constant and time-varying diagonal components in two separate Vecs, and only have one Mat, and then do MatDiagonalSet() twice at each timestep - the first time with INSERT_VALUES, the second with ADD_VALUES. That sounds like it would be faster. Thanks to you too, Sean > > > On Fri, May 9, 2008 at 8:50 PM, Sean Dettrick > wrote: > > One way to do it is to have two Mats, A and B, and a Vec, D, to > store the diagonal. A is constructed only on the first step. On > subsequent steps, A is copied into B, and then D is added to the > diagonal: > > ierr = MatCopy( A, B, SAME_NON_ZERO_PATTERN ); > ierr = MatDiagonalSet( B, D, ADD_VALUES ); > > The KSP uses B as the matrix, not A. > > I don't know if this approach is efficient or not. Can anybody > comment? > > Thanks, > Sean > > > > > On May 9, 2008, at 2:33 PM, Ben Tay wrote: > > Hi, > I have a matrix and I inserted all the relevant values during the > 1st step. I'll then solve it. For the subsequent steps, I only need > to change the diagonal vector of the matrix before solving. I wonder > how I can do it efficiently. Of course, the RHS vector also change > but I've not included them here. > > I set these at the 1st step: > > call > KSPSetOperators > (ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr) > > call KSPGetPC(ksp_semi_x,pc_semi_x,ierr) > > ksptype=KSPRICHARDSON > > call KSPSetType(ksp_semi_x,ksptype,ierr) > > ptype = PCILU > > call PCSetType(pc_semi_x,ptype,ierr) > > call KSPSetFromOptions(ksp_semi_x,ierr) > > call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr) > > tol=1.e-5 > > call > KSPSetTolerances > (ksp_semi_x > ,tol > ,PETSC_DEFAULT_DOUBLE_PRECISION > ,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr) > > and what I did at the subsequent steps is: > > do II=1,total > call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr) > > end do > > call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr) > > call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr) > > I realise that the answers are slightly different as compared to > calling all the options such as KSPSetType, KSPSetFromOptions, > KSPSetTolerances at every time step. Should that be so? Is this the > best way? > > Also, I can let the matrix be equal at every time step by fixing the > delta_time. However, it may give stability problems. I wonder how > expensive is these type of value changing and assembly for a matrix? > > Thank you very much. > > Regards. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Amit.Itagi at seagate.com Mon May 12 08:27:50 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Mon, 12 May 2008 09:27:50 -0400 Subject: Code structuring - Communicator In-Reply-To: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov> Message-ID: Thanks, Barry. Rgds, Amit Barry Smith To Sent by: petsc-users at mcs.anl.gov owner-petsc-users cc @mcs.anl.gov No Phone Info Subject Available Re: Code structuring - Communicator 05/09/2008 03:07 PM Please respond to petsc-users at mcs.a nl.gov There are many ways to do this, most of them involve using MPI to construct subcommunicators for the various sub parallel tasks. You very likely want to keep PetscInitialize() at the very beginning of the program; you would not write the calls in terms of PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the subcommunicators to create the objects. An alternative approach is to look at the manual page for PetscOpenMPMerge(), PetscOpenMPRun(), PetscOpenMPNew() in petsc-dev. These allow a simple master-worker model of parallelism with PETSc with a bunch of masters that can work together (instead of just one master) and each master controls a bunch of workers. The code in src/ksp/pc/impls/ openmp uses this code. Note that OpenMP has NOTHING to do with OpenMP the standard. Also I don't really have any support for Fortran, I hope you use C/C++. Comments welcome. It sounds like this matches what you need. It's pretty cool, but underdeveloped. Barry On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote: > > Hi, > > I have a question about the Petsc communicator. I have a petsc program > "foo" which essentially runs in parallel and gives me > y=f(x1,x2,...), where > y is an output parameter and xi's are input parameters. Suppose, I > want to > run a parallel optimizer for the input parameters. I am looking at the > following functionality. I submit the optimizer job on 16 processors > (using > "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of > "foo", each running parallely on 4 processors. "foo" will be written > as a > function and not as a main program in this case. How can I get this > functionality using Petsc ? Should PetscInitialize be called in the > optimizer, or in each foo run ? If PetscInitialize is called in the > optimizer, is there a way to make the foo function run only on a > subset of > the 16 processors ? > > May be, I haven't done a good job of explaining my problem. Let me > know if > you need any clarifications. > > Thanks > > Rgds, > Amit > From mfatenejad at wisc.edu Mon May 12 13:10:29 2008 From: mfatenejad at wisc.edu (Milad Fatenejad) Date: Mon, 12 May 2008 13:10:29 -0500 Subject: 2 Questions about DAs Message-ID: Hello: First, I'm having some email problems, so sorry if this shows up twice... I am using PETSc to write a large multi-physics finite difference code with a lot of opportunity for overlapping computation and communication. Right now, I have created ~100 petsc vectors for storing various quantities, which currently all share a single DA. The problem with this system is that I can only scatter one quantity at a time to update the values of the ghost points. If I try to scatter more than one object at a time, I get the following error: [0]PETSC ERROR: Object is in wrong state! [0]PETSC ERROR: Scatter ctx already in use! It would be really nice to be able to start scattering a vector whenever I am done with a computation, and just finish the scatter whenever I need the vector again. Again, this is impossible because all of the vectors share the same DA. I then reorganized my code, so that each vector had its own DA, however, this led to the program running significantly more slowly (I assume this is just because I have so many vectors). So my first question is: Is there a way to organize the code so I can overlap the scattering of vectors without having a significant performance hit? And on a related note, many times I need to create arrays of vectors. I just discovered the function "VecDuplicateVecs" (and related functions), which look like performs this operation. Is this the best way to create arrays of vectors? Is there a way to directly get the array from the DA without having to create a vector and duplicate it (I don't see a "DACreateGlobalVectorS")? I know it is also possible to do something like this using the DOF parameter in the DACreate call as shown in: http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html Are there any advantages to using dof as opposed to VecDuplicateVecs, etc.? I'd appreciate any help Thank You Milad Fatenejad From mfatenejad at wisc.edu Mon May 12 11:02:33 2008 From: mfatenejad at wisc.edu (Milad Fatenejad) Date: Mon, 12 May 2008 11:02:33 -0500 Subject: 2 Questions about DAs Message-ID: Hello: I have two separate DA questions: 1) I am writing a large finite difference code and would like to be able to represent an array of vectors. I am currently doing this by creating a single DA and calling DACreateGlobalVector several times, but the manual also states that: "PETSc currently provides no container for multiple arrays sharing the same distributed array communication; note, however, that the dof parameter handles many cases of interest." I also found the following mailing list thread which describes how to use the dof parameter to represent several vectors: http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html Where the following solution is proposed: """ The easiest thing to do in C is to declare a struct: typedef struct { PetscScalar v[3]; PetscScalar p; } Space; and then cast pointers Space ***array; DAVecGetArray(da, u, (void *) &array); array[k][j][i].v *= -1.0; """ The problem with the proposed solution, is that they use a struct to get the individual values, but what if you don't know the number of degrees of freedom at compile time? So my question is two fold: a) Is there a problem with just having a single DA and calling DACreateGlobalVector multiple times? Does this affect performance at all (I have many different vectors)? b) Is there a way to use the dof parameter when creating a DA when the number of degrees of freedom is not known at compile time? Specifically, I would like to be able to access the individual values of the vector, just like the example shows... 2) The code I am writing has a lot of different parts which present a lot of opportunities to overlap communication an computation when scattering vectors to update values in the ghost points. Right now, all of my vectors (there are ~50 of them) share a single DA because they all have the same shape. However, by sharing a single DA, I can only scatter one vector at a time. It would be nice to be able to start scattering each vector right after I'm done computing it, and finish scattering it right before I need it again but I can't because other vectors might need to be scattered in between. I then re-wrote part of my code so that each vector had its own DA object, but this ended up being incredibly slow (I assume this is because I have so many vectors). My question is, is there a way to scatter multiple vectors simultaneously without affecting the performance of the code? Does it make sense to do this? I'd really appreciate any help... Thanks Milad Fatenejad From icksa1 at gmail.com Mon May 12 13:41:02 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Mon, 12 May 2008 13:41:02 -0500 Subject: 2 Questions about DAs Message-ID: Hello: First, I'm having some email problems, so sorry if this shows up a few times... I am using PETSc to write a large multi-physics finite difference code with a lot of opportunity for overlapping computation and communication. Right now, I have created ~100 petsc vectors for storing various quantities, which currently all share a single DA. The problem with this system is that I can only scatter one quantity at a time to update the values of the ghost points. If I try to scatter more than one object at a time, I get the following error: [0]PETSC ERROR: Object is in wrong state! [0]PETSC ERROR: Scatter ctx already in use! It would be really nice to be able to start scattering a vector whenever I am done with a computation, and just finish the scatter whenever I need the vector again. Again, this is impossible because all of the vectors share the same DA. I then reorganized my code, so that each vector had its own DA, however, this led to the program running significantly more slowly (I assume this is just because I have so many vectors). So my first question is: Is there a way to organize the code so I can overlap the scattering of vectors without having a significant performance hit? And on a related note, many times I need to create arrays of vectors. I just discovered the function "VecDuplicateVecs" (and related functions), which look like performs this operation. Is this the best way to create arrays of vectors? Is there a way to directly get the array from the DA without having to create a vector and duplicate it (I don't see a "DACreateGlobalVectorS")? I know it is also possible to do something like this using the DOF parameter in the DACreate call as shown in: http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html Are there any advantages to using dof as opposed to VecDuplicateVecs, etc.? I'd appreciate any help Thank You Milad Fatenejad From Amit.Itagi at seagate.com Mon May 12 13:43:50 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Mon, 12 May 2008 14:43:50 -0400 Subject: 2 Questions about DAs In-Reply-To: Message-ID: Milad, I have a DA with 6 vectors sharing the DA structure. I defined the DA to have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter the vectors separately. Things seem to work fine. Thanks Rgds, Amit "Milad Fatenejad" To Sent by: petsc-users at mcs.anl.gov owner-petsc-users cc @mcs.anl.gov No Phone Info Subject Available 2 Questions about DAs 05/12/2008 12:02 PM Please respond to petsc-users at mcs.a nl.gov Hello: I have two separate DA questions: 1) I am writing a large finite difference code and would like to be able to represent an array of vectors. I am currently doing this by creating a single DA and calling DACreateGlobalVector several times, but the manual also states that: "PETSc currently provides no container for multiple arrays sharing the same distributed array communication; note, however, that the dof parameter handles many cases of interest." I also found the following mailing list thread which describes how to use the dof parameter to represent several vectors: http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html Where the following solution is proposed: """ The easiest thing to do in C is to declare a struct: typedef struct { PetscScalar v[3]; PetscScalar p; } Space; and then cast pointers Space ***array; DAVecGetArray(da, u, (void *) &array); array[k][j][i].v *= -1.0; """ The problem with the proposed solution, is that they use a struct to get the individual values, but what if you don't know the number of degrees of freedom at compile time? So my question is two fold: a) Is there a problem with just having a single DA and calling DACreateGlobalVector multiple times? Does this affect performance at all (I have many different vectors)? b) Is there a way to use the dof parameter when creating a DA when the number of degrees of freedom is not known at compile time? Specifically, I would like to be able to access the individual values of the vector, just like the example shows... 2) The code I am writing has a lot of different parts which present a lot of opportunities to overlap communication an computation when scattering vectors to update values in the ghost points. Right now, all of my vectors (there are ~50 of them) share a single DA because they all have the same shape. However, by sharing a single DA, I can only scatter one vector at a time. It would be nice to be able to start scattering each vector right after I'm done computing it, and finish scattering it right before I need it again but I can't because other vectors might need to be scattered in between. I then re-wrote part of my code so that each vector had its own DA object, but this ended up being incredibly slow (I assume this is because I have so many vectors). My question is, is there a way to scatter multiple vectors simultaneously without affecting the performance of the code? Does it make sense to do this? I'd really appreciate any help... Thanks Milad Fatenejad From knepley at gmail.com Mon May 12 13:56:51 2008 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 12 May 2008 13:56:51 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad wrote: > Hello: > I have two separate DA questions: > > 1) I am writing a large finite difference code and would like to be > able to represent an array of vectors. I am currently doing this by > creating a single DA and calling DACreateGlobalVector several times, > but the manual also states that: > > "PETSc currently provides no container for multiple arrays sharing the > same distributed array communication; note, however, that the dof > parameter handles many cases of interest." > > I also found the following mailing list thread which describes how to > use the dof parameter to represent several vectors: > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > Where the following solution is proposed: > """ > The easiest thing to do in C is to declare a struct: > > typedef struct { > PetscScalar v[3]; > PetscScalar p; > } Space; > > and then cast pointers > > Space ***array; > > DAVecGetArray(da, u, (void *) &array); > > array[k][j][i].v *= -1.0; > """ > > The problem with the proposed solution, is that they use a struct to > get the individual values, but what if you don't know the number of > degrees of freedom at compile time? It would be nice to get variable structs in C. However, you can just deference the object directly. For example, for 50 degrees of freedom, you can do array[k][j][i][47] *= -1.0; > So my question is two fold: > a) Is there a problem with just having a single DA and calling > DACreateGlobalVector multiple times? Does this affect performance at > all (I have many different vectors)? These are all independent objects. Thus, by itself, creating any number of Vecs does nothing to performance (unless you start to run out of memory). > b) Is there a way to use the dof parameter when creating a DA when the > number of degrees of freedom is not known at compile time? > Specifically, I would like to be able to access the individual values > of the vector, just like the example shows... see above. > 2) The code I am writing has a lot of different parts which present a > lot of opportunities to overlap communication an computation when > scattering vectors to update values in the ghost points. Right now, > all of my vectors (there are ~50 of them) share a single DA because > they all have the same shape. However, by sharing a single DA, I can > only scatter one vector at a time. It would be nice to be able to > start scattering each vector right after I'm done computing it, and > finish scattering it right before I need it again but I can't because > other vectors might need to be scattered in between. I then re-wrote > part of my code so that each vector had its own DA object, but this > ended up being incredibly slow (I assume this is because I have so > many vectors). The problem here is that buffering will have to be done for each outstanding scatter. Thus I see two resolutions: 1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once. This is essentially what you accomplish with separate DAs. 2) You the dof method. However, this scatter ALL the vectors every time. I do not understand what performance problem you would have with multiple DAs. With any performance questions, we suggest sending the output of -log_summary so we have data to look at. Matt > My question is, is there a way to scatter multiple vectors > simultaneously without affecting the performance of the code? Does it > make sense to do this? > > > I'd really appreciate any help... > > Thanks > Milad Fatenejad > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From icksa1 at gmail.com Mon May 12 13:58:35 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Mon, 12 May 2008 13:58:35 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: Hi, Thanks for the reply. If I have two global vectors from the same DA, say global1 and global2 and I try to scatter to local vectors local1 and local2 using the command DAGlobalToLocalBegin/End in the following manner: DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); Everything is fine. If instead I do: DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); I get the error: [0]PETSC ERROR: Object is in wrong state! [0]PETSC ERROR: Scatter ctx already in use! What I would like to be able to do is overlap the scattering and that produces the error. Thanks Milad On Mon, May 12, 2008 at 1:43 PM, wrote: > Milad, > > I have a DA with 6 vectors sharing the DA structure. I defined the DA to > have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter > the vectors separately. Things seem to work fine. > > Thanks > > Rgds, > Amit > > > > > "Milad Fatenejad" > edu> To > Sent by: petsc-users at mcs.anl.gov > owner-petsc-users cc > @mcs.anl.gov > No Phone Info Subject > Available 2 Questions about DAs > > > 05/12/2008 12:02 > PM > > > Please respond to > petsc-users at mcs.a > nl.gov > > > > > > > > > Hello: > I have two separate DA questions: > > 1) I am writing a large finite difference code and would like to be > able to represent an array of vectors. I am currently doing this by > creating a single DA and calling DACreateGlobalVector several times, > but the manual also states that: > > "PETSc currently provides no container for multiple arrays sharing the > same distributed array communication; note, however, that the dof > parameter handles many cases of interest." > > I also found the following mailing list thread which describes how to > use the dof parameter to represent several vectors: > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > Where the following solution is proposed: > """ > The easiest thing to do in C is to declare a struct: > > typedef struct { > PetscScalar v[3]; > PetscScalar p; > } Space; > > and then cast pointers > > Space ***array; > > DAVecGetArray(da, u, (void *) &array); > > array[k][j][i].v *= -1.0; > """ > > The problem with the proposed solution, is that they use a struct to > get the individual values, but what if you don't know the number of > degrees of freedom at compile time? > > So my question is two fold: > a) Is there a problem with just having a single DA and calling > DACreateGlobalVector multiple times? Does this affect performance at > all (I have many different vectors)? > b) Is there a way to use the dof parameter when creating a DA when the > number of degrees of freedom is not known at compile time? > Specifically, I would like to be able to access the individual values > of the vector, just like the example shows... > > > 2) The code I am writing has a lot of different parts which present a > lot of opportunities to overlap communication an computation when > scattering vectors to update values in the ghost points. Right now, > all of my vectors (there are ~50 of them) share a single DA because > they all have the same shape. However, by sharing a single DA, I can > only scatter one vector at a time. It would be nice to be able to > start scattering each vector right after I'm done computing it, and > finish scattering it right before I need it again but I can't because > other vectors might need to be scattered in between. I then re-wrote > part of my code so that each vector had its own DA object, but this > ended up being incredibly slow (I assume this is because I have so > many vectors). > > My question is, is there a way to scatter multiple vectors > simultaneously without affecting the performance of the code? Does it > make sense to do this? > > > I'd really appreciate any help... > > Thanks > Milad Fatenejad > > > > From tsjb00 at hotmail.com Mon May 12 13:59:49 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Mon, 12 May 2008 18:59:49 +0000 Subject: Q. of multi-componet system and data from input file In-Reply-To: References: Message-ID: Hi, there! I am a beginner of PETSc and I have some questions about using PETSc to solve for a multi-componet system. The code is supposed to be applicable to different systems, where number of components, properties of components ,etc. would be input for the program. Say I define DA with dof=number of components = nc, number of grid in x,y,z = nx,ny,nz respectively. When I use DA related functions, it seems that by default the data objects (vectors, arrays, etc.) would be of nx*ny*nz*nc. However, some physical variables are independent of specific components, which means I need to handle data objects of nx*ny*nz*integral. My questions are: Does PETSc include tools or examples to deal with such problems? If not, how can I make sure the 'nx*ny*nz*any integral' data objects are distributed over the nodes in a way defined by DA? I am using PETSc_Decide for partitioning right now. I would prefer that at least the number of processors be flexible. I need to read in a property f(x,y,z) from a data file and then distribute the data across different processors. Any suggestions on this would be appreciated. My concern is that if I use MPI_Send/Receive, the data to be transferred might correspond to discontinuous indices due to the partitioning. Many thanks in advance! BJ > _________________________________________________________________ MSN ???????????????????? http://cn.msn.com From knepley at gmail.com Mon May 12 14:33:27 2008 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 12 May 2008 14:33:27 -0500 Subject: Q. of multi-componet system and data from input file In-Reply-To: References: Message-ID: 2008/5/12 tsjb00 : > > Hi, there! I am a beginner of PETSc and I have some questions about using PETSc to solve for a multi-componet system. The code is supposed to be applicable to different systems, where number of components, properties of components ,etc. would be input for the program. > > Say I define DA with dof=number of components = nc, number of grid in x,y,z = nx,ny,nz respectively. When I use DA related functions, it seems that by default the data objects (vectors, arrays, etc.) would be of nx*ny*nz*nc. However, some physical variables are independent of specific components, which means I need to handle data objects of nx*ny*nz*integral. My questions are: > > Does PETSc include tools or examples to deal with such problems? Make a new DA for those vectors. DA are extremely small since they store O(1) data. > If not, how can I make sure the 'nx*ny*nz*any integral' data objects are distributed over the nodes in a way defined by DA? I am using PETSc_Decide for partitioning right now. I would prefer that at least the number of processors be flexible. > > I need to read in a property f(x,y,z) from a data file and then distribute the data across different processors. Any suggestions on this would be appreciated. My concern is that if I use MPI_Send/Receive, the data to be transferred might correspond to discontinuous indices due to the partitioning. If you store that data in PETSc Vec format, you can just use VecLoad() and we will distribute everything for you. A simple way to do this, is to read it in on 1 process, put it in a Vec, and VecView(). Then you can read it back in on multiple processes after that. Matt > Many thanks in advance! > > BJ > > > > > > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From Amit.Itagi at seagate.com Mon May 12 14:51:38 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Mon, 12 May 2008 15:51:38 -0400 Subject: 2 Questions about DAs In-Reply-To: Message-ID: I don't understand your motivation for trying to have two consecutive scatterBegin before the scatterEnd of the first. I think that the scatterBegin and the corresponding scatterEnd need to thought of as a single scatter operation. Infact, I don't understand the concept of "overlapping" the scattering. Thanks Rgds, Amit "Milad Fatenejad" To Sent by: petsc-users at mcs.anl.gov owner-petsc-users cc @mcs.anl.gov No Phone Info Subject Available Re: 2 Questions about DAs 05/12/2008 02:58 PM Please respond to petsc-users at mcs.a nl.gov Hi, Thanks for the reply. If I have two global vectors from the same DA, say global1 and global2 and I try to scatter to local vectors local1 and local2 using the command DAGlobalToLocalBegin/End in the following manner: DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); Everything is fine. If instead I do: DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); I get the error: [0]PETSC ERROR: Object is in wrong state! [0]PETSC ERROR: Scatter ctx already in use! What I would like to be able to do is overlap the scattering and that produces the error. Thanks Milad On Mon, May 12, 2008 at 1:43 PM, wrote: > Milad, > > I have a DA with 6 vectors sharing the DA structure. I defined the DA to > have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter > the vectors separately. Things seem to work fine. > > Thanks > > Rgds, > Amit > > > > > "Milad Fatenejad" > edu> To > Sent by: petsc-users at mcs.anl.gov > owner-petsc-users cc > @mcs.anl.gov > No Phone Info Subject > Available 2 Questions about DAs > > > 05/12/2008 12:02 > PM > > > Please respond to > petsc-users at mcs.a > nl.gov > > > > > > > > > Hello: > I have two separate DA questions: > > 1) I am writing a large finite difference code and would like to be > able to represent an array of vectors. I am currently doing this by > creating a single DA and calling DACreateGlobalVector several times, > but the manual also states that: > > "PETSc currently provides no container for multiple arrays sharing the > same distributed array communication; note, however, that the dof > parameter handles many cases of interest." > > I also found the following mailing list thread which describes how to > use the dof parameter to represent several vectors: > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > Where the following solution is proposed: > """ > The easiest thing to do in C is to declare a struct: > > typedef struct { > PetscScalar v[3]; > PetscScalar p; > } Space; > > and then cast pointers > > Space ***array; > > DAVecGetArray(da, u, (void *) &array); > > array[k][j][i].v *= -1.0; > """ > > The problem with the proposed solution, is that they use a struct to > get the individual values, but what if you don't know the number of > degrees of freedom at compile time? > > So my question is two fold: > a) Is there a problem with just having a single DA and calling > DACreateGlobalVector multiple times? Does this affect performance at > all (I have many different vectors)? > b) Is there a way to use the dof parameter when creating a DA when the > number of degrees of freedom is not known at compile time? > Specifically, I would like to be able to access the individual values > of the vector, just like the example shows... > > > 2) The code I am writing has a lot of different parts which present a > lot of opportunities to overlap communication an computation when > scattering vectors to update values in the ghost points. Right now, > all of my vectors (there are ~50 of them) share a single DA because > they all have the same shape. However, by sharing a single DA, I can > only scatter one vector at a time. It would be nice to be able to > start scattering each vector right after I'm done computing it, and > finish scattering it right before I need it again but I can't because > other vectors might need to be scattered in between. I then re-wrote > part of my code so that each vector had its own DA object, but this > ended up being incredibly slow (I assume this is because I have so > many vectors). > > My question is, is there a way to scatter multiple vectors > simultaneously without affecting the performance of the code? Does it > make sense to do this? > > > I'd really appreciate any help... > > Thanks > Milad Fatenejad > > > > From icksa1 at gmail.com Mon May 12 15:01:49 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Mon, 12 May 2008 15:01:49 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: Hello: I've attached the result of two calculations. The file "log-multi-da" uses 1 DA for each vector (322 in all) and the file "log-single-da" using 1 DA for the entire calculation. When using 322 DA's, about 10x more time is spent in VecScatterBegin and VecScatterEnd. Both were running using two processes I should mention that the source code for these two runs was exactly the same, I didn't reorder the scatters differently. The only difference was the number of DAs Any suggestions? Do you think this is related to the number of DA's, or something else? Thanks for your help Milad On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley wrote: > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad wrote: > > Hello: > > I have two separate DA questions: > > > > 1) I am writing a large finite difference code and would like to be > > able to represent an array of vectors. I am currently doing this by > > creating a single DA and calling DACreateGlobalVector several times, > > but the manual also states that: > > > > "PETSc currently provides no container for multiple arrays sharing the > > same distributed array communication; note, however, that the dof > > parameter handles many cases of interest." > > > > I also found the following mailing list thread which describes how to > > use the dof parameter to represent several vectors: > > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > Where the following solution is proposed: > > """ > > The easiest thing to do in C is to declare a struct: > > > > typedef struct { > > PetscScalar v[3]; > > PetscScalar p; > > } Space; > > > > and then cast pointers > > > > Space ***array; > > > > DAVecGetArray(da, u, (void *) &array); > > > > array[k][j][i].v *= -1.0; > > """ > > > > The problem with the proposed solution, is that they use a struct to > > get the individual values, but what if you don't know the number of > > degrees of freedom at compile time? > > It would be nice to get variable structs in C. However, you can just deference > the object directly. For example, for 50 degrees of freedom, you can do > > array[k][j][i][47] *= -1.0; > > > > So my question is two fold: > > a) Is there a problem with just having a single DA and calling > > DACreateGlobalVector multiple times? Does this affect performance at > > all (I have many different vectors)? > > These are all independent objects. Thus, by itself, creating any number of > Vecs does nothing to performance (unless you start to run out of memory). > > > > b) Is there a way to use the dof parameter when creating a DA when the > > number of degrees of freedom is not known at compile time? > > Specifically, I would like to be able to access the individual values > > of the vector, just like the example shows... > > > see above. > > > 2) The code I am writing has a lot of different parts which present a > > lot of opportunities to overlap communication an computation when > > scattering vectors to update values in the ghost points. Right now, > > all of my vectors (there are ~50 of them) share a single DA because > > they all have the same shape. However, by sharing a single DA, I can > > only scatter one vector at a time. It would be nice to be able to > > start scattering each vector right after I'm done computing it, and > > finish scattering it right before I need it again but I can't because > > other vectors might need to be scattered in between. I then re-wrote > > part of my code so that each vector had its own DA object, but this > > ended up being incredibly slow (I assume this is because I have so > > many vectors). > > The problem here is that buffering will have to be done for each outstanding > scatter. Thus I see two resolutions: > > 1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once. > This is essentially what you accomplish with separate DAs. > > 2) You the dof method. However, this scatter ALL the vectors every time. > > I do not understand what performance problem you would have with multiple > DAs. With any performance questions, we suggest sending the output of > -log_summary so we have data to look at. > > Matt > > > > > My question is, is there a way to scatter multiple vectors > > simultaneously without affecting the performance of the code? Does it > > make sense to do this? > > > > > > I'd really appreciate any help... > > > > Thanks > > Milad Fatenejad > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > -------------- next part -------------- A non-text attachment was scrubbed... Name: log-multi-da Type: application/octet-stream Size: 11372 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log-single-da Type: application/octet-stream Size: 11372 bytes Desc: not available URL: From knepley at gmail.com Mon May 12 15:15:45 2008 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 12 May 2008 15:15:45 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad wrote: > Hello: > I've attached the result of two calculations. The file "log-multi-da" > uses 1 DA for each vector (322 in all) and the file "log-single-da" > using 1 DA for the entire calculation. When using 322 DA's, about 10x > more time is spent in VecScatterBegin and VecScatterEnd. Both were > running using two processes > > I should mention that the source code for these two runs was exactly > the same, I didn't reorder the scatters differently. The only > difference was the number of DAs > > Any suggestions? Do you think this is related to the number of DA's, > or something else? There are vastly different numbers of reductions and much bigger memory usage. Please send the code and I will look at it. Matt > Thanks for your help > Milad > > On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley wrote: > > > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad wrote: > > > Hello: > > > I have two separate DA questions: > > > > > > 1) I am writing a large finite difference code and would like to be > > > able to represent an array of vectors. I am currently doing this by > > > creating a single DA and calling DACreateGlobalVector several times, > > > but the manual also states that: > > > > > > "PETSc currently provides no container for multiple arrays sharing the > > > same distributed array communication; note, however, that the dof > > > parameter handles many cases of interest." > > > > > > I also found the following mailing list thread which describes how to > > > use the dof parameter to represent several vectors: > > > > > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > > > Where the following solution is proposed: > > > """ > > > The easiest thing to do in C is to declare a struct: > > > > > > typedef struct { > > > PetscScalar v[3]; > > > PetscScalar p; > > > } Space; > > > > > > and then cast pointers > > > > > > Space ***array; > > > > > > DAVecGetArray(da, u, (void *) &array); > > > > > > array[k][j][i].v *= -1.0; > > > """ > > > > > > The problem with the proposed solution, is that they use a struct to > > > get the individual values, but what if you don't know the number of > > > degrees of freedom at compile time? > > > > It would be nice to get variable structs in C. However, you can just deference > > the object directly. For example, for 50 degrees of freedom, you can do > > > > array[k][j][i][47] *= -1.0; > > > > > > > So my question is two fold: > > > a) Is there a problem with just having a single DA and calling > > > DACreateGlobalVector multiple times? Does this affect performance at > > > all (I have many different vectors)? > > > > These are all independent objects. Thus, by itself, creating any number of > > Vecs does nothing to performance (unless you start to run out of memory). > > > > > > > b) Is there a way to use the dof parameter when creating a DA when the > > > number of degrees of freedom is not known at compile time? > > > Specifically, I would like to be able to access the individual values > > > of the vector, just like the example shows... > > > > > > see above. > > > > > 2) The code I am writing has a lot of different parts which present a > > > lot of opportunities to overlap communication an computation when > > > scattering vectors to update values in the ghost points. Right now, > > > all of my vectors (there are ~50 of them) share a single DA because > > > they all have the same shape. However, by sharing a single DA, I can > > > only scatter one vector at a time. It would be nice to be able to > > > start scattering each vector right after I'm done computing it, and > > > finish scattering it right before I need it again but I can't because > > > other vectors might need to be scattered in between. I then re-wrote > > > part of my code so that each vector had its own DA object, but this > > > ended up being incredibly slow (I assume this is because I have so > > > many vectors). > > > > The problem here is that buffering will have to be done for each outstanding > > scatter. Thus I see two resolutions: > > > > 1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once. > > This is essentially what you accomplish with separate DAs. > > > > 2) You the dof method. However, this scatter ALL the vectors every time. > > > > I do not understand what performance problem you would have with multiple > > DAs. With any performance questions, we suggest sending the output of > > -log_summary so we have data to look at. > > > > Matt > > > > > > > > > My question is, is there a way to scatter multiple vectors > > > simultaneously without affecting the performance of the code? Does it > > > make sense to do this? > > > > > > > > > I'd really appreciate any help... > > > > > > Thanks > > > Milad Fatenejad > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From icksa1 at gmail.com Mon May 12 15:18:34 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Mon, 12 May 2008 15:18:34 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: Hi Amit: I was thinking of the following situation. There are two vectors global/local1 and global/local2. Function f1 modifies the values of global1 and global2. Function f3 requires the updated ghost point values. Function f2 doesn't depend on either vector at all, but is really computationally intensive I would like to do the following: f1(global1, global2) // Both vectors modified, I need to scatter before calling f3() DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); f2() // function that takes a really long time DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); f3(local1, local2) // Needs the ghost values If I don't overlap the scattering, I end up with something like this: f1(global1, global2) // Both vectors modified, I need to scatter before calling f3() DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); f2() // function that takes a really long time DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); // Nothing left to do, just wait for scattering to end DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); f3(local1, local2) // Needs the ghost values In the second case, there is nothing left to do while the second vector scatters and it seems like I just have to wait for this to occur. Ideally, I would like to scatter both vectors while waiting for f2() to finish... I'm a little new to all of this, so let me know if my understanding is just wrong... Thanks Milad On Mon, May 12, 2008 at 2:51 PM, wrote: > I don't understand your motivation for trying to have two consecutive > scatterBegin before the scatterEnd of the first. I think that the > scatterBegin and the corresponding scatterEnd need to thought of as a > single scatter operation. Infact, I don't understand the concept of > "overlapping" the scattering. > > > Thanks > > Rgds, > Amit > > > > > "Milad Fatenejad" > > > To > Sent by: petsc-users at mcs.anl.gov > owner-petsc-users cc > @mcs.anl.gov > No Phone Info Subject > Available Re: 2 Questions about DAs > > > 05/12/2008 02:58 > > > PM > > > Please respond to > petsc-users at mcs.a > nl.gov > > > > > > > Hi, > Thanks for the reply. > > If I have two global vectors from the same DA, say global1 and global2 > and I try to scatter to local vectors local1 and local2 using the > command DAGlobalToLocalBegin/End in the following manner: > > DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); > DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); > > DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); > DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); > > Everything is fine. If instead I do: > > DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); > DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); > > DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); > DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); > > I get the error: > [0]PETSC ERROR: Object is in wrong state! > [0]PETSC ERROR: Scatter ctx already in use! > > What I would like to be able to do is overlap the scattering and that > produces the error. > > Thanks > Milad > > On Mon, May 12, 2008 at 1:43 PM, wrote: > > Milad, > > > > I have a DA with 6 vectors sharing the DA structure. I defined the DA to > > have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter > > the vectors separately. Things seem to work fine. > > > > Thanks > > > > Rgds, > > Amit > > > > > > > > > > "Milad Fatenejad" > > > edu> > To > > Sent by: petsc-users at mcs.anl.gov > > owner-petsc-users > cc > > @mcs.anl.gov > > No Phone Info > Subject > > Available 2 Questions about DAs > > > > > > 05/12/2008 12:02 > > PM > > > > > > Please respond to > > petsc-users at mcs.a > > nl.gov > > > > > > > > > > > > > > > > > > Hello: > > I have two separate DA questions: > > > > 1) I am writing a large finite difference code and would like to be > > able to represent an array of vectors. I am currently doing this by > > creating a single DA and calling DACreateGlobalVector several times, > > but the manual also states that: > > > > "PETSc currently provides no container for multiple arrays sharing the > > same distributed array communication; note, however, that the dof > > parameter handles many cases of interest." > > > > I also found the following mailing list thread which describes how to > > use the dof parameter to represent several vectors: > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > > > > Where the following solution is proposed: > > """ > > The easiest thing to do in C is to declare a struct: > > > > typedef struct { > > PetscScalar v[3]; > > PetscScalar p; > > } Space; > > > > and then cast pointers > > > > Space ***array; > > > > DAVecGetArray(da, u, (void *) &array); > > > > array[k][j][i].v *= -1.0; > > """ > > > > The problem with the proposed solution, is that they use a struct to > > get the individual values, but what if you don't know the number of > > degrees of freedom at compile time? > > > > So my question is two fold: > > a) Is there a problem with just having a single DA and calling > > DACreateGlobalVector multiple times? Does this affect performance at > > all (I have many different vectors)? > > b) Is there a way to use the dof parameter when creating a DA when the > > number of degrees of freedom is not known at compile time? > > Specifically, I would like to be able to access the individual values > > of the vector, just like the example shows... > > > > > > 2) The code I am writing has a lot of different parts which present a > > lot of opportunities to overlap communication an computation when > > scattering vectors to update values in the ghost points. Right now, > > all of my vectors (there are ~50 of them) share a single DA because > > they all have the same shape. However, by sharing a single DA, I can > > only scatter one vector at a time. It would be nice to be able to > > start scattering each vector right after I'm done computing it, and > > finish scattering it right before I need it again but I can't because > > other vectors might need to be scattered in between. I then re-wrote > > part of my code so that each vector had its own DA object, but this > > ended up being incredibly slow (I assume this is because I have so > > many vectors). > > > > My question is, is there a way to scatter multiple vectors > > simultaneously without affecting the performance of the code? Does it > > make sense to do this? > > > > > > I'd really appreciate any help... > > > > Thanks > > Milad Fatenejad > > > > > > > > > > > > From Amit.Itagi at seagate.com Mon May 12 16:14:07 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Mon, 12 May 2008 17:14:07 -0400 Subject: 2 Questions about DAs In-Reply-To: Message-ID: May be, if you compare the time taken to scatter and the time taken by f2, one of them might dominate the other. Thus, in any case, the slower of the two will determine the time to complete the scattering and f2 execution. Thanks Rgds, Amit "Milad Fatenejad" To Sent by: petsc-users at mcs.anl.gov owner-petsc-users cc @mcs.anl.gov No Phone Info Subject Available Re: 2 Questions about DAs 05/12/2008 04:18 PM Please respond to petsc-users at mcs.a nl.gov Hi Amit: I was thinking of the following situation. There are two vectors global/local1 and global/local2. Function f1 modifies the values of global1 and global2. Function f3 requires the updated ghost point values. Function f2 doesn't depend on either vector at all, but is really computationally intensive I would like to do the following: f1(global1, global2) // Both vectors modified, I need to scatter before calling f3() DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); f2() // function that takes a really long time DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); f3(local1, local2) // Needs the ghost values If I don't overlap the scattering, I end up with something like this: f1(global1, global2) // Both vectors modified, I need to scatter before calling f3() DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); f2() // function that takes a really long time DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); // Nothing left to do, just wait for scattering to end DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); f3(local1, local2) // Needs the ghost values In the second case, there is nothing left to do while the second vector scatters and it seems like I just have to wait for this to occur. Ideally, I would like to scatter both vectors while waiting for f2() to finish... I'm a little new to all of this, so let me know if my understanding is just wrong... Thanks Milad On Mon, May 12, 2008 at 2:51 PM, wrote: > I don't understand your motivation for trying to have two consecutive > scatterBegin before the scatterEnd of the first. I think that the > scatterBegin and the corresponding scatterEnd need to thought of as a > single scatter operation. Infact, I don't understand the concept of > "overlapping" the scattering. > > > Thanks > > Rgds, > Amit > > > > > "Milad Fatenejad" > > > To > Sent by: petsc-users at mcs.anl.gov > owner-petsc-users cc > @mcs.anl.gov > No Phone Info Subject > Available Re: 2 Questions about DAs > > > 05/12/2008 02:58 > > > PM > > > Please respond to > petsc-users at mcs.a > nl.gov > > > > > > > Hi, > Thanks for the reply. > > If I have two global vectors from the same DA, say global1 and global2 > and I try to scatter to local vectors local1 and local2 using the > command DAGlobalToLocalBegin/End in the following manner: > > DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); > DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); > > DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); > DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); > > Everything is fine. If instead I do: > > DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1); > DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2); > > DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1); > DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2); > > I get the error: > [0]PETSC ERROR: Object is in wrong state! > [0]PETSC ERROR: Scatter ctx already in use! > > What I would like to be able to do is overlap the scattering and that > produces the error. > > Thanks > Milad > > On Mon, May 12, 2008 at 1:43 PM, wrote: > > Milad, > > > > I have a DA with 6 vectors sharing the DA structure. I defined the DA to > > have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter > > the vectors separately. Things seem to work fine. > > > > Thanks > > > > Rgds, > > Amit > > > > > > > > > > "Milad Fatenejad" > > > edu> > To > > Sent by: petsc-users at mcs.anl.gov > > owner-petsc-users > cc > > @mcs.anl.gov > > No Phone Info > Subject > > Available 2 Questions about DAs > > > > > > 05/12/2008 12:02 > > PM > > > > > > Please respond to > > petsc-users at mcs.a > > nl.gov > > > > > > > > > > > > > > > > > > Hello: > > I have two separate DA questions: > > > > 1) I am writing a large finite difference code and would like to be > > able to represent an array of vectors. I am currently doing this by > > creating a single DA and calling DACreateGlobalVector several times, > > but the manual also states that: > > > > "PETSc currently provides no container for multiple arrays sharing the > > same distributed array communication; note, however, that the dof > > parameter handles many cases of interest." > > > > I also found the following mailing list thread which describes how to > > use the dof parameter to represent several vectors: > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > > > > Where the following solution is proposed: > > """ > > The easiest thing to do in C is to declare a struct: > > > > typedef struct { > > PetscScalar v[3]; > > PetscScalar p; > > } Space; > > > > and then cast pointers > > > > Space ***array; > > > > DAVecGetArray(da, u, (void *) &array); > > > > array[k][j][i].v *= -1.0; > > """ > > > > The problem with the proposed solution, is that they use a struct to > > get the individual values, but what if you don't know the number of > > degrees of freedom at compile time? > > > > So my question is two fold: > > a) Is there a problem with just having a single DA and calling > > DACreateGlobalVector multiple times? Does this affect performance at > > all (I have many different vectors)? > > b) Is there a way to use the dof parameter when creating a DA when the > > number of degrees of freedom is not known at compile time? > > Specifically, I would like to be able to access the individual values > > of the vector, just like the example shows... > > > > > > 2) The code I am writing has a lot of different parts which present a > > lot of opportunities to overlap communication an computation when > > scattering vectors to update values in the ghost points. Right now, > > all of my vectors (there are ~50 of them) share a single DA because > > they all have the same shape. However, by sharing a single DA, I can > > only scatter one vector at a time. It would be nice to be able to > > start scattering each vector right after I'm done computing it, and > > finish scattering it right before I need it again but I can't because > > other vectors might need to be scattered in between. I then re-wrote > > part of my code so that each vector had its own DA object, but this > > ended up being incredibly slow (I assume this is because I have so > > many vectors). > > > > My question is, is there a way to scatter multiple vectors > > simultaneously without affecting the performance of the code? Does it > > make sense to do this? > > > > > > I'd really appreciate any help... > > > > Thanks > > Milad Fatenejad > > > > > > > > > > > > From icksa1 at gmail.com Mon May 12 16:28:50 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Mon, 12 May 2008 16:28:50 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: Hi Matt: The code is several thousand lines long, requires many external libraries and is generally very messy right now. I'd rather not send it because I wouldn't want to take up too much of your time. I think I will try to go back and try set up some simpler problems to test the difference between 1 vs. many DA's, and will write back if I have the same issue. Thank you Milad On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley wrote: > On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad wrote: > > Hello: > > I've attached the result of two calculations. The file "log-multi-da" > > uses 1 DA for each vector (322 in all) and the file "log-single-da" > > using 1 DA for the entire calculation. When using 322 DA's, about 10x > > more time is spent in VecScatterBegin and VecScatterEnd. Both were > > running using two processes > > > > I should mention that the source code for these two runs was exactly > > the same, I didn't reorder the scatters differently. The only > > difference was the number of DAs > > > > Any suggestions? Do you think this is related to the number of DA's, > > or something else? > > There are vastly different numbers of reductions and much bigger memory usage. > Please send the code and I will look at it. > > Matt > > > > > Thanks for your help > > Milad > > > > On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley wrote: > > > > > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad wrote: > > > > Hello: > > > > I have two separate DA questions: > > > > > > > > 1) I am writing a large finite difference code and would like to be > > > > able to represent an array of vectors. I am currently doing this by > > > > creating a single DA and calling DACreateGlobalVector several times, > > > > but the manual also states that: > > > > > > > > "PETSc currently provides no container for multiple arrays sharing the > > > > same distributed array communication; note, however, that the dof > > > > parameter handles many cases of interest." > > > > > > > > I also found the following mailing list thread which describes how to > > > > use the dof parameter to represent several vectors: > > > > > > > > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > > > > > Where the following solution is proposed: > > > > """ > > > > The easiest thing to do in C is to declare a struct: > > > > > > > > typedef struct { > > > > PetscScalar v[3]; > > > > PetscScalar p; > > > > } Space; > > > > > > > > and then cast pointers > > > > > > > > Space ***array; > > > > > > > > DAVecGetArray(da, u, (void *) &array); > > > > > > > > array[k][j][i].v *= -1.0; > > > > """ > > > > > > > > The problem with the proposed solution, is that they use a struct to > > > > get the individual values, but what if you don't know the number of > > > > degrees of freedom at compile time? > > > > > > It would be nice to get variable structs in C. However, you can just deference > > > the object directly. For example, for 50 degrees of freedom, you can do > > > > > > array[k][j][i][47] *= -1.0; > > > > > > > > > > So my question is two fold: > > > > a) Is there a problem with just having a single DA and calling > > > > DACreateGlobalVector multiple times? Does this affect performance at > > > > all (I have many different vectors)? > > > > > > These are all independent objects. Thus, by itself, creating any number of > > > Vecs does nothing to performance (unless you start to run out of memory). > > > > > > > > > > b) Is there a way to use the dof parameter when creating a DA when the > > > > number of degrees of freedom is not known at compile time? > > > > Specifically, I would like to be able to access the individual values > > > > of the vector, just like the example shows... > > > > > > > > > see above. > > > > > > > 2) The code I am writing has a lot of different parts which present a > > > > lot of opportunities to overlap communication an computation when > > > > scattering vectors to update values in the ghost points. Right now, > > > > all of my vectors (there are ~50 of them) share a single DA because > > > > they all have the same shape. However, by sharing a single DA, I can > > > > only scatter one vector at a time. It would be nice to be able to > > > > start scattering each vector right after I'm done computing it, and > > > > finish scattering it right before I need it again but I can't because > > > > other vectors might need to be scattered in between. I then re-wrote > > > > part of my code so that each vector had its own DA object, but this > > > > ended up being incredibly slow (I assume this is because I have so > > > > many vectors). > > > > > > The problem here is that buffering will have to be done for each outstanding > > > scatter. Thus I see two resolutions: > > > > > > 1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once. > > > This is essentially what you accomplish with separate DAs. > > > > > > 2) You the dof method. However, this scatter ALL the vectors every time. > > > > > > I do not understand what performance problem you would have with multiple > > > DAs. With any performance questions, we suggest sending the output of > > > -log_summary so we have data to look at. > > > > > > Matt > > > > > > > > > > > > > My question is, is there a way to scatter multiple vectors > > > > simultaneously without affecting the performance of the code? Does it > > > > make sense to do this? > > > > > > > > > > > > I'd really appreciate any help... > > > > > > > > Thanks > > > > Milad Fatenejad > > > > > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > From icksa1 at gmail.com Mon May 12 18:21:07 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Mon, 12 May 2008 18:21:07 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: Hi: I created a simple test problem that demonstrates the issue. In the test problem, 100 vectors are created using: single.cpp: a single distributed array and multi.cpp: 100 distributed arrays Some math is performed on the vectors, then they are scattered to local vectors.. The log summary (running 2 processes) shows that multi.cpp uses more memory and performs more reductions than single.cpp, which is similar to the experience I had with my program... I hope this helps Milad On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley wrote: > On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad wrote: > > Hello: > > I've attached the result of two calculations. The file "log-multi-da" > > uses 1 DA for each vector (322 in all) and the file "log-single-da" > > using 1 DA for the entire calculation. When using 322 DA's, about 10x > > more time is spent in VecScatterBegin and VecScatterEnd. Both were > > running using two processes > > > > I should mention that the source code for these two runs was exactly > > the same, I didn't reorder the scatters differently. The only > > difference was the number of DAs > > > > Any suggestions? Do you think this is related to the number of DA's, > > or something else? > > There are vastly different numbers of reductions and much bigger memory usage. > Please send the code and I will look at it. > > Matt > > > > > Thanks for your help > > Milad > > > > On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley wrote: > > > > > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad wrote: > > > > Hello: > > > > I have two separate DA questions: > > > > > > > > 1) I am writing a large finite difference code and would like to be > > > > able to represent an array of vectors. I am currently doing this by > > > > creating a single DA and calling DACreateGlobalVector several times, > > > > but the manual also states that: > > > > > > > > "PETSc currently provides no container for multiple arrays sharing the > > > > same distributed array communication; note, however, that the dof > > > > parameter handles many cases of interest." > > > > > > > > I also found the following mailing list thread which describes how to > > > > use the dof parameter to represent several vectors: > > > > > > > > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > > > > > Where the following solution is proposed: > > > > """ > > > > The easiest thing to do in C is to declare a struct: > > > > > > > > typedef struct { > > > > PetscScalar v[3]; > > > > PetscScalar p; > > > > } Space; > > > > > > > > and then cast pointers > > > > > > > > Space ***array; > > > > > > > > DAVecGetArray(da, u, (void *) &array); > > > > > > > > array[k][j][i].v *= -1.0; > > > > """ > > > > > > > > The problem with the proposed solution, is that they use a struct to > > > > get the individual values, but what if you don't know the number of > > > > degrees of freedom at compile time? > > > > > > It would be nice to get variable structs in C. However, you can just deference > > > the object directly. For example, for 50 degrees of freedom, you can do > > > > > > array[k][j][i][47] *= -1.0; > > > > > > > > > > So my question is two fold: > > > > a) Is there a problem with just having a single DA and calling > > > > DACreateGlobalVector multiple times? Does this affect performance at > > > > all (I have many different vectors)? > > > > > > These are all independent objects. Thus, by itself, creating any number of > > > Vecs does nothing to performance (unless you start to run out of memory). > > > > > > > > > > b) Is there a way to use the dof parameter when creating a DA when the > > > > number of degrees of freedom is not known at compile time? > > > > Specifically, I would like to be able to access the individual values > > > > of the vector, just like the example shows... > > > > > > > > > see above. > > > > > > > 2) The code I am writing has a lot of different parts which present a > > > > lot of opportunities to overlap communication an computation when > > > > scattering vectors to update values in the ghost points. Right now, > > > > all of my vectors (there are ~50 of them) share a single DA because > > > > they all have the same shape. However, by sharing a single DA, I can > > > > only scatter one vector at a time. It would be nice to be able to > > > > start scattering each vector right after I'm done computing it, and > > > > finish scattering it right before I need it again but I can't because > > > > other vectors might need to be scattered in between. I then re-wrote > > > > part of my code so that each vector had its own DA object, but this > > > > ended up being incredibly slow (I assume this is because I have so > > > > many vectors). > > > > > > The problem here is that buffering will have to be done for each outstanding > > > scatter. Thus I see two resolutions: > > > > > > 1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once. > > > This is essentially what you accomplish with separate DAs. > > > > > > 2) You the dof method. However, this scatter ALL the vectors every time. > > > > > > I do not understand what performance problem you would have with multiple > > > DAs. With any performance questions, we suggest sending the output of > > > -log_summary so we have data to look at. > > > > > > Matt > > > > > > > > > > > > > My question is, is there a way to scatter multiple vectors > > > > simultaneously without affecting the performance of the code? Does it > > > > make sense to do this? > > > > > > > > > > > > I'd really appreciate any help... > > > > > > > > Thanks > > > > Milad Fatenejad > > > > > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > -------------- next part -------------- A non-text attachment was scrubbed... Name: log-multi Type: application/octet-stream Size: 10666 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log-single Type: application/octet-stream Size: 10666 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: multi.cpp Type: text/x-c++src Size: 1557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: single.cpp Type: text/x-c++src Size: 1449 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon May 12 19:07:33 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 12 May 2008 19:07:33 -0500 Subject: Q. of multi-componet system and data from input file In-Reply-To: References: Message-ID: <5BABB255-02D5-40E9-9899-0B62CB3AE7C0@mcs.anl.gov> On May 12, 2008, at 1:59 PM, tsjb00 wrote: > > Hi, there! I am a beginner of PETSc and I have some questions about > using PETSc to solve for a multi-componet system. The code is > supposed to be applicable to different systems, where number of > components, properties of components ,etc. would be input for the > program. > > Say I define DA with dof=number of components = nc, number of grid > in x,y,z = nx,ny,nz respectively. When I use DA related functions, > it seems that by default the data objects (vectors, arrays, etc.) > would be of nx*ny*nz*nc. However, some physical variables are > independent of specific components, which means I need to handle > data objects of nx*ny*nz*integral. My questions are: > > Does PETSc include tools or examples to deal with such problems? > > If not, how can I make sure the 'nx*ny*nz*any integral' data objects > are distributed over the nodes in a way defined by DA? I am using > PETSc_Decide for partitioning right now. I would prefer that at > least the number of processors be flexible. I do not understand your question but here is a stab at it. For each different nc you need in your code you simply create a different DA. For example if you have two fields you want together and also have three fields together you would create one DA for the 2 dof and one for the 3 dof. Adding a couple more DA's won't take much memory or time. You can use DAVecGetArray() to access the values for a fixed dof, if sometimes you want nc to be different for different runs within the same loops you can use DAVecGetArrayDOF(). You access values via x[k][j][i] [l] where l goes from 0 to dof-1 for the dof that you used to create the DA. Barry > > > I need to read in a property f(x,y,z) from a data file and then > distribute the data across different processors. Any suggestions on > this would be appreciated. My concern is that if I use MPI_Send/ > Receive, the data to be transferred might correspond to > discontinuous indices due to the partitioning. > > Many thanks in advance! > > BJ > >> > > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > > From bsmith at mcs.anl.gov Mon May 12 19:22:04 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 12 May 2008 19:22:04 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: A couple of items. Overlapping communication and computation is pretty much a myth. The CPU is used by MPI to pack the messages and put them on the network so it is not available for computation during this time. Usually if you try to overlap communication and computation it will end up being slower and I've never seen it faster. Vendors will try to trick you into buying a machine by saying it does it, but it really doesn't. Just forget about trying to do it. Creating a DA involves a good amount of setup and some communication; it is fine to use a few DA's but setting up hundreds of DAs is not a good idea UNLESS YOU DO TONS OF WORK for each DA. In your case you are doing just a tiny amount of communication with each DA so the DA setup time is dominating. If you have hundreds of vectors that you wish to communicate AT THE SAME TIME (seems strange but I suppose it is possible), then rather than having hundreds of DAGlobalToLocalBegin/End() in a row you will want to create an additional "meta" DA that has the same m,n,p as the regular DA but has a dof equal to the number of vectors you wish to communicate at the same time. Use VecStrideScatterAll() to get the individual vectors into a meta vector, do the DAGlobalToLocalBegin/End() on the meta vector to get the ghost values and then use DAStrideGatherAll() to get the values into the 322 individual ghosted vectors. The reason to do it this way is so the values in all the vectors are all sent together in a single MPI message instead of the separate message that would needed for each of the small DAGlobalToLocalBegin/End(). Barry On May 12, 2008, at 6:21 PM, Milad Fatenejad wrote: > Hi: > I created a simple test problem that demonstrates the issue. In the > test problem, 100 vectors are created using: > single.cpp: a single distributed array and > multi.cpp: 100 distributed arrays > > Some math is performed on the vectors, then they are scattered to > local vectors.. > > The log summary (running 2 processes) shows that multi.cpp uses more > memory and performs more reductions than single.cpp, which is similar > to the experience I had with my program... > > I hope this helps > Milad > > On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley > wrote: >> On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad >> wrote: >>> Hello: >>> I've attached the result of two calculations. The file "log-multi- >>> da" >>> uses 1 DA for each vector (322 in all) and the file "log-single-da" >>> using 1 DA for the entire calculation. When using 322 DA's, about >>> 10x >>> more time is spent in VecScatterBegin and VecScatterEnd. Both were >>> running using two processes >>> >>> I should mention that the source code for these two runs was exactly >>> the same, I didn't reorder the scatters differently. The only >>> difference was the number of DAs >>> >>> Any suggestions? Do you think this is related to the number of DA's, >>> or something else? >> >> There are vastly different numbers of reductions and much bigger >> memory usage. >> Please send the code and I will look at it. >> >> Matt >> >> >> >>> Thanks for your help >>> Milad >>> >>> On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley >>> wrote: >>>> >>>> On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad >>> > wrote: >>>>> Hello: >>>>> I have two separate DA questions: >>>>> >>>>> 1) I am writing a large finite difference code and would like to >>>>> be >>>>> able to represent an array of vectors. I am currently doing this >>>>> by >>>>> creating a single DA and calling DACreateGlobalVector several >>>>> times, >>>>> but the manual also states that: >>>>> >>>>> "PETSc currently provides no container for multiple arrays >>>>> sharing the >>>>> same distributed array communication; note, however, that the dof >>>>> parameter handles many cases of interest." >>>>> >>>>> I also found the following mailing list thread which describes >>>>> how to >>>>> use the dof parameter to represent several vectors: >>>>> >>>>> >>>>> http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html >>>>> >>>>> Where the following solution is proposed: >>>>> """ >>>>> The easiest thing to do in C is to declare a struct: >>>>> >>>>> typedef struct { >>>>> PetscScalar v[3]; >>>>> PetscScalar p; >>>>> } Space; >>>>> >>>>> and then cast pointers >>>>> >>>>> Space ***array; >>>>> >>>>> DAVecGetArray(da, u, (void *) &array); >>>>> >>>>> array[k][j][i].v *= -1.0; >>>>> """ >>>>> >>>>> The problem with the proposed solution, is that they use a >>>>> struct to >>>>> get the individual values, but what if you don't know the number >>>>> of >>>>> degrees of freedom at compile time? >>>> >>>> It would be nice to get variable structs in C. However, you can >>>> just deference >>>> the object directly. For example, for 50 degrees of freedom, you >>>> can do >>>> >>>> array[k][j][i][47] *= -1.0; >>>> >>>> >>>>> So my question is two fold: >>>>> a) Is there a problem with just having a single DA and calling >>>>> DACreateGlobalVector multiple times? Does this affect >>>>> performance at >>>>> all (I have many different vectors)? >>>> >>>> These are all independent objects. Thus, by itself, creating any >>>> number of >>>> Vecs does nothing to performance (unless you start to run out of >>>> memory). >>>> >>>> >>>>> b) Is there a way to use the dof parameter when creating a DA >>>>> when the >>>>> number of degrees of freedom is not known at compile time? >>>>> Specifically, I would like to be able to access the individual >>>>> values >>>>> of the vector, just like the example shows... >>>> >>>> >>>> see above. >>>> >>>>> 2) The code I am writing has a lot of different parts which >>>>> present a >>>>> lot of opportunities to overlap communication an computation when >>>>> scattering vectors to update values in the ghost points. Right >>>>> now, >>>>> all of my vectors (there are ~50 of them) share a single DA >>>>> because >>>>> they all have the same shape. However, by sharing a single DA, I >>>>> can >>>>> only scatter one vector at a time. It would be nice to be able to >>>>> start scattering each vector right after I'm done computing it, >>>>> and >>>>> finish scattering it right before I need it again but I can't >>>>> because >>>>> other vectors might need to be scattered in between. I then re- >>>>> wrote >>>>> part of my code so that each vector had its own DA object, but >>>>> this >>>>> ended up being incredibly slow (I assume this is because I have so >>>>> many vectors). >>>> >>>> The problem here is that buffering will have to be done for each >>>> outstanding >>>> scatter. Thus I see two resolutions: >>>> >>>> 1) Duplicate the DA scatter for as many Vecs as you wish to >>>> scatter at once. >>>> This is essentially what you accomplish with separate DAs. >>>> >>>> 2) You the dof method. However, this scatter ALL the vectors >>>> every time. >>>> >>>> I do not understand what performance problem you would have with >>>> multiple >>>> DAs. With any performance questions, we suggest sending the >>>> output of >>>> -log_summary so we have data to look at. >>>> >>>> Matt >>>> >>>> >>>> >>>>> My question is, is there a way to scatter multiple vectors >>>>> simultaneously without affecting the performance of the code? >>>>> Does it >>>>> make sense to do this? >>>>> >>>>> >>>>> I'd really appreciate any help... >>>>> >>>>> Thanks >>>>> Milad Fatenejad >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to >>>> which >>>> their experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>> >> >> >> >> -- >> >> >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> > From neckel at in.tum.de Tue May 13 00:42:08 2008 From: neckel at in.tum.de (Tobias Neckel) Date: Tue, 13 May 2008 07:42:08 +0200 Subject: Question on matrix preallocation In-Reply-To: References: <48230924.7040600@in.tum.de> Message-ID: <48292A30.8040803@in.tum.de> > You are assembling before inserting values. This wipes out the preallocation > information since assembly shrinks the matrix to an optimal size. Sorry for being quite late, but thanks a lot, Matt! The assembling caused the problem also in my real code ... small cause, large effect. Now it is running fast :-) Best regards Tobias -- Dipl.-Tech. Math. Tobias Neckel Institut f?r Informatik V, TU M?nchen Boltzmannstr. 3, 85748 Garching Tel.: 089/289-18602 Email: neckel at in.tum.de URL: http://www5.in.tum.de/persons/neckel.html From gaatenek at irisa.fr Tue May 13 08:10:02 2008 From: gaatenek at irisa.fr (gaatenek at irisa.fr) Date: Tue, 13 May 2008 15:10:02 +0200 (CEST) Subject: MatILUDTFactor In-Reply-To: <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov> References: <48209347.1060500@fluorem.com> <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov> Message-ID: <42920.131.254.11.127.1210684202.squirrel@mail.irisa.fr> Hello, I am trying to use MatILUDTFactor make an incomplete factorisation for preconditionner. I did not if it work correctly. In general case, when you reduce you drop tolerance criteria your preconditionner become better, but I all example that I use he dot not change. I dont know if I use it well? Do I have to install SPARSEKIT2 to make it work well? Guy Atenekeng From knepley at gmail.com Tue May 13 08:21:14 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 13 May 2008 08:21:14 -0500 Subject: MatILUDTFactor In-Reply-To: <42920.131.254.11.127.1210684202.squirrel@mail.irisa.fr> References: <48209347.1060500@fluorem.com> <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov> <42920.131.254.11.127.1210684202.squirrel@mail.irisa.fr> Message-ID: On Tue, May 13, 2008 at 8:10 AM, wrote: > Hello, > I am trying to use MatILUDTFactor make an incomplete factorisation for > preconditioner. > I did not if it work correctly. In general case, when you reduce you drop > tolerance criteria your preconditioner become better, but I all example > that I use he dot not change. I don't know if I use it well? Do I have to > install SPARSEKIT2 to make it work well? 1) ILU does not necessarily improve with increasing fill. There are no theoretical results for this PC. 2) I would first run with -ksp_view to see exactly what you have Matt > Guy Atenekeng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From tsjb00 at hotmail.com Tue May 13 10:14:05 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Tue, 13 May 2008 15:14:05 +0000 Subject: Q. of multi-componet system and data from input file In-Reply-To: <5BABB255-02D5-40E9-9899-0B62CB3AE7C0@mcs.anl.gov> References: <5BABB255-02D5-40E9-9899-0B62CB3AE7C0@mcs.anl.gov> Message-ID: Many thanks for your help! ---------------------------------------- > From: bsmith at mcs.anl.gov > To: petsc-users at mcs.anl.gov > Subject: Re: Q. of multi-componet system and data from input file > Date: Mon, 12 May 2008 19:07:33 -0500 > > > On May 12, 2008, at 1:59 PM, tsjb00 wrote: > >> >> Hi, there! I am a beginner of PETSc and I have some questions about >> using PETSc to solve for a multi-componet system. The code is >> supposed to be applicable to different systems, where number of >> components, properties of components ,etc. would be input for the >> program. >> >> Say I define DA with dof=number of components = nc, number of grid >> in x,y,z = nx,ny,nz respectively. When I use DA related functions, >> it seems that by default the data objects (vectors, arrays, etc.) >> would be of nx*ny*nz*nc. However, some physical variables are >> independent of specific components, which means I need to handle >> data objects of nx*ny*nz*integral. My questions are: >> >> Does PETSc include tools or examples to deal with such problems? >> >> If not, how can I make sure the 'nx*ny*nz*any integral' data objects >> are distributed over the nodes in a way defined by DA? I am using >> PETSc_Decide for partitioning right now. I would prefer that at >> least the number of processors be flexible. > > I do not understand your question but here is a stab at it. For > each different nc you need in your code you simply create a different > DA. For example if you have > two fields you want together and also have three fields together you > would create one DA for the 2 dof and one for the 3 dof. Adding a > couple more DA's won't take > much memory or time. > > You can use DAVecGetArray() to access the values for a fixed dof, > if sometimes you want nc to be different for different runs within the > same > loops you can use DAVecGetArrayDOF(). You access values via x[k][j][i] > [l] where l goes from 0 to dof-1 for the dof that you used to create > the DA. > > Barry > >> >> >> I need to read in a property f(x,y,z) from a data file and then >> distribute the data across different processors. Any suggestions on >> this would be appreciated. My concern is that if I use MPI_Send/ >> Receive, the data to be transferred might correspond to >> discontinuous indices due to the partitioning. >> >> Many thanks in advance! >> >> BJ >> >>> >> >> _________________________________________________________________ >> MSN ???????????????????? >> http://cn.msn.com >> >> > _________________________________________________________________ Windows Live Photo gallery ????????????????????????????? http://get.live.cn/product/photo.html From tsjb00 at hotmail.com Tue May 13 10:14:38 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Tue, 13 May 2008 15:14:38 +0000 Subject: Q. of multi-componet system and data from input file In-Reply-To: References: Message-ID: Many thanks for the reply! It really helps! ---------------------------------------- > Date: Mon, 12 May 2008 14:33:27 -0500 > From: knepley at gmail.com > To: petsc-users at mcs.anl.gov > Subject: Re: Q. of multi-componet system and data from input file > > 2008/5/12 tsjb00 : >> >> Hi, there! I am a beginner of PETSc and I have some questions about using PETSc to solve for a multi-componet system. The code is supposed to be applicable to different systems, where number of components, properties of components ,etc. would be input for the program. >> >> Say I define DA with dof=number of components = nc, number of grid in x,y,z = nx,ny,nz respectively. When I use DA related functions, it seems that by default the data objects (vectors, arrays, etc.) would be of nx*ny*nz*nc. However, some physical variables are independent of specific components, which means I need to handle data objects of nx*ny*nz*integral. My questions are: >> >> Does PETSc include tools or examples to deal with such problems? > > Make a new DA for those vectors. DA are extremely small since they > store O(1) data. > >> If not, how can I make sure the 'nx*ny*nz*any integral' data objects are distributed over the nodes in a way defined by DA? I am using PETSc_Decide for partitioning right now. I would prefer that at least the number of processors be flexible. >> >> I need to read in a property f(x,y,z) from a data file and then distribute the data across different processors. Any suggestions on this would be appreciated. My concern is that if I use MPI_Send/Receive, the data to be transferred might correspond to discontinuous indices due to the partitioning. > > If you store that data in PETSc Vec format, you can just use VecLoad() > and we will distribute everything for you. A > simple way to do this, is to read it in on 1 process, put it in a Vec, > and VecView(). Then you can read it back in on > multiple processes after that. > > Matt > >> Many thanks in advance! >> >> BJ >> >> > >> >> >> _________________________________________________________________ >> MSN ???????????????????? >> http://cn.msn.com >> >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > _________________________________________________________________ ???MSN??????????????????? http://mobile.msn.com.cn/ From icksa1 at gmail.com Tue May 13 10:46:35 2008 From: icksa1 at gmail.com (Milad Fatenejad) Date: Tue, 13 May 2008 10:46:35 -0500 Subject: 2 Questions about DAs In-Reply-To: References: Message-ID: Hello: Thanks for all of your help, this has helped me tremendously! Milad On Mon, May 12, 2008 at 7:22 PM, Barry Smith wrote: > > A couple of items. > > Overlapping communication and computation is pretty much a myth. The CPU > is used by MPI to pack > the messages and put them on the network so it is not available for > computation during this time. Usually > if you try to overlap communication and computation it will end up being > slower and I've never seen it faster. > Vendors will try to trick you into buying a machine by saying it does it, > but it really doesn't. Just forget about trying to do it. > > Creating a DA involves a good amount of setup and some communication; it > is fine to use a few DA's > but setting up hundreds of DAs is not a good idea UNLESS YOU DO TONS OF > WORK for each DA. > In your case you are doing just a tiny amount of communication with each > DA so the DA setup time > is dominating. > > If you have hundreds of vectors that you wish to communicate AT THE SAME > TIME (seems strange but > I suppose it is possible), then rather than having hundreds of > DAGlobalToLocalBegin/End() in a row > you will want to create an additional "meta" DA that has the same m,n,p as > the regular DA but has a > dof equal to the number of vectors you wish to communicate at the same > time. Use VecStrideScatterAll() > to get the individual vectors into a meta vector, do the > DAGlobalToLocalBegin/End() on the meta vector > to get the ghost values and then use DAStrideGatherAll() to get the values > into the 322 individual ghosted > vectors. The reason to do it this way is so the values in all the vectors > are all sent together in a single > MPI message instead of the separate message that would needed for each of > the small > DAGlobalToLocalBegin/End(). > > Barry > > > > > > On May 12, 2008, at 6:21 PM, Milad Fatenejad wrote: > > > > > > > > > > Hi: > > I created a simple test problem that demonstrates the issue. In the > > test problem, 100 vectors are created using: > > single.cpp: a single distributed array and > > multi.cpp: 100 distributed arrays > > > > Some math is performed on the vectors, then they are scattered to > > local vectors.. > > > > The log summary (running 2 processes) shows that multi.cpp uses more > > memory and performs more reductions than single.cpp, which is similar > > to the experience I had with my program... > > > > I hope this helps > > Milad > > > > On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley > wrote: > > > > > On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad > wrote: > > > > > > > Hello: > > > > I've attached the result of two calculations. The file "log-multi-da" > > > > uses 1 DA for each vector (322 in all) and the file "log-single-da" > > > > using 1 DA for the entire calculation. When using 322 DA's, about 10x > > > > more time is spent in VecScatterBegin and VecScatterEnd. Both were > > > > running using two processes > > > > > > > > I should mention that the source code for these two runs was exactly > > > > the same, I didn't reorder the scatters differently. The only > > > > difference was the number of DAs > > > > > > > > Any suggestions? Do you think this is related to the number of DA's, > > > > or something else? > > > > > > > > > > There are vastly different numbers of reductions and much bigger memory > usage. > > > Please send the code and I will look at it. > > > > > > Matt > > > > > > > > > > > > > > > > Thanks for your help > > > > Milad > > > > > > > > On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley > wrote: > > > > > > > > > > > > > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad > wrote: > > > > > > > > > > > Hello: > > > > > > I have two separate DA questions: > > > > > > > > > > > > 1) I am writing a large finite difference code and would like to > be > > > > > > able to represent an array of vectors. I am currently doing this > by > > > > > > creating a single DA and calling DACreateGlobalVector several > times, > > > > > > but the manual also states that: > > > > > > > > > > > > "PETSc currently provides no container for multiple arrays sharing > the > > > > > > same distributed array communication; note, however, that the dof > > > > > > parameter handles many cases of interest." > > > > > > > > > > > > I also found the following mailing list thread which describes how > to > > > > > > use the dof parameter to represent several vectors: > > > > > > > > > > > > > > > > > > > http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html > > > > > > > > > > > > Where the following solution is proposed: > > > > > > """ > > > > > > The easiest thing to do in C is to declare a struct: > > > > > > > > > > > > typedef struct { > > > > > > PetscScalar v[3]; > > > > > > PetscScalar p; > > > > > > } Space; > > > > > > > > > > > > and then cast pointers > > > > > > > > > > > > Space ***array; > > > > > > > > > > > > DAVecGetArray(da, u, (void *) &array); > > > > > > > > > > > > array[k][j][i].v *= -1.0; > > > > > > """ > > > > > > > > > > > > The problem with the proposed solution, is that they use a struct > to > > > > > > get the individual values, but what if you don't know the number > of > > > > > > degrees of freedom at compile time? > > > > > > > > > > > > > > > > It would be nice to get variable structs in C. However, you can just > deference > > > > > the object directly. For example, for 50 degrees of freedom, you can > do > > > > > > > > > > array[k][j][i][47] *= -1.0; > > > > > > > > > > > > > > > > > > > > > So my question is two fold: > > > > > > a) Is there a problem with just having a single DA and calling > > > > > > DACreateGlobalVector multiple times? Does this affect performance > at > > > > > > all (I have many different vectors)? > > > > > > > > > > > > > > > > These are all independent objects. Thus, by itself, creating any > number of > > > > > Vecs does nothing to performance (unless you start to run out of > memory). > > > > > > > > > > > > > > > > > > > > > b) Is there a way to use the dof parameter when creating a DA when > the > > > > > > number of degrees of freedom is not known at compile time? > > > > > > Specifically, I would like to be able to access the individual > values > > > > > > of the vector, just like the example shows... > > > > > > > > > > > > > > > > > > > > > see above. > > > > > > > > > > > > > > > > 2) The code I am writing has a lot of different parts which > present a > > > > > > lot of opportunities to overlap communication an computation when > > > > > > scattering vectors to update values in the ghost points. Right > now, > > > > > > all of my vectors (there are ~50 of them) share a single DA > because > > > > > > they all have the same shape. However, by sharing a single DA, I > can > > > > > > only scatter one vector at a time. It would be nice to be able to > > > > > > start scattering each vector right after I'm done computing it, > and > > > > > > finish scattering it right before I need it again but I can't > because > > > > > > other vectors might need to be scattered in between. I then > re-wrote > > > > > > part of my code so that each vector had its own DA object, but > this > > > > > > ended up being incredibly slow (I assume this is because I have so > > > > > > many vectors). > > > > > > > > > > > > > > > > The problem here is that buffering will have to be done for each > outstanding > > > > > scatter. Thus I see two resolutions: > > > > > > > > > > 1) Duplicate the DA scatter for as many Vecs as you wish to scatter > at once. > > > > > This is essentially what you accomplish with separate DAs. > > > > > > > > > > 2) You the dof method. However, this scatter ALL the vectors every > time. > > > > > > > > > > I do not understand what performance problem you would have with > multiple > > > > > DAs. With any performance questions, we suggest sending the output > of > > > > > -log_summary so we have data to look at. > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > > > > > My question is, is there a way to scatter multiple vectors > > > > > > simultaneously without affecting the performance of the code? Does > it > > > > > > make sense to do this? > > > > > > > > > > > > > > > > > > I'd really appreciate any help... > > > > > > > > > > > > Thanks > > > > > > Milad Fatenejad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before they begin their > > > > > experiments is infinitely more interesting than any results to which > > > > > their experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > > > > > > > > > > From balay at mcs.anl.gov Tue May 13 10:55:26 2008 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 13 May 2008 10:55:26 -0500 (CDT) Subject: BOUNCE petsc-users@mcs.anl.gov: Non-member submission from ["Lars Rindorf" ] (fwd) Message-ID: From Lars.Rindorf at teknologisk.dk Tue May 13 08:49:19 2008 From: Lars.Rindorf at teknologisk.dk (Lars Rindorf) Date: Tue, 13 May 2008 15:49:19 +0200 Subject: Parallel petsc with external UMFPACK Message-ID: Hi I'm thinking about using petsc to solve a linear system (Ax=b) using parallelization on a couple of linux computers. It is very important for my system (electromagnetics) to use a direct solver, such as UMFPACK. Iterative solvers perform very poorly. My question is: I can see that UMFPACK is not listed (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertab le.html) on the petsc page. Are there any plans to expand petsc to also include petsc? The UMFPACK homepage says that there exist a parallization for UMFPACK by Steve Hadfield. Can his parallelization be used with petsc? I must add, that although I like numerics and maths I do not intend to program 'from scratch'. Kind regards Lars Rindorf _____________________________ Lars Rindorf M.Sc., Ph.D. http://www.dti.dk Danish Technological Institute Gregersensvej 2630 Taastrup Denmark Phone +45 72 20 20 00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue May 13 11:06:50 2008 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 13 May 2008 11:06:50 -0500 (CDT) Subject: Subject: Parallel petsc with external UMFPACK In-Reply-To: References: Message-ID: > From: "Lars Rindorf" > To: > I'm thinking about using petsc to solve a linear system (Ax=b) using > parallelization on a couple of linux computers. It is very important for > my system (electromagnetics) to use a direct solver, such as UMFPACK. > Iterative solvers perform very poorly. > > My question is: I can see that UMFPACK is not listed > (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertab > le.html) on the petsc page. Are there any plans to expand petsc to also > include petsc? Its listed there. However it says parallel/complex support is not available. However - if you need a parallel direct solver - you might explore MUMPS. The other alternatives are SuperLU_DIST, spooles. Satish > The UMFPACK homepage says that there exist a parallization for UMFPACK > by Steve Hadfield. Can his parallelization be used with petsc? > > I must add, that although I like numerics and maths I do not intend to > program 'from scratch'. > > Kind regards > Lars Rindorf From Lars.Rindorf at teknologisk.dk Tue May 13 11:40:03 2008 From: Lars.Rindorf at teknologisk.dk (Lars Rindorf) Date: Tue, 13 May 2008 18:40:03 +0200 Subject: SV: Subject: Parallel petsc with external UMFPACK In-Reply-To: Message-ID: Hi Satish I'm sorry. My text should have read, that UMFPACK is listed as not available in parallel. My fault. MUMPS sounds very interesting being a multifrontal solver. I'll try it out. Thanks! KR, Lars -----Oprindelig meddelelse----- Fra: owner-petsc-users at mcs.anl.gov [mailto:owner-petsc-users at mcs.anl.gov] P? vegne af Satish Balay Sendt: 13. maj 2008 18:07 Til: petsc-users at mcs.anl.gov Emne: Re: Subject: Parallel petsc with external UMFPACK > From: "Lars Rindorf" > To: > I'm thinking about using petsc to solve a linear system (Ax=b) using > parallelization on a couple of linux computers. It is very important > for my system (electromagnetics) to use a direct solver, such as UMFPACK. > Iterative solvers perform very poorly. > > My question is: I can see that UMFPACK is not listed > (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvert > ab > le.html) on the petsc page. Are there any plans to expand petsc to > also include petsc? Its listed there. However it says parallel/complex support is not available. However - if you need a parallel direct solver - you might explore MUMPS. The other alternatives are SuperLU_DIST, spooles. Satish > The UMFPACK homepage says that there exist a parallization for UMFPACK > by Steve Hadfield. Can his parallelization be used with petsc? > > I must add, that although I like numerics and maths I do not intend to > program 'from scratch'. > > Kind regards > Lars Rindorf From dalcinl at gmail.com Tue May 13 13:04:07 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 13 May 2008 15:04:07 -0300 Subject: PETSc and parallel direct solvers In-Reply-To: References: Message-ID: On 5/13/08, Lars Rindorf wrote: > Dear Lisandro > > I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance? Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!... To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not. Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS. As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff. I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods. Regards, > > Kind regards, Lars > > -----Oprindelig meddelelse----- > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] > Sendt: 13. maj 2008 18:54 > Til: Lars Rindorf > Emne: PETSc and parallel direct solvers > > > Dear lars, I saw you post to petsc-users, it bounced because you have to suscribe to the list > > I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you? > > I usually do this to easy switch to use mumps. First, in the source code, after assembling your matrix, add the following > > MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A); > > And then, when you actually run your program, add the following to the command line: > > $ mpiexec -n ./yourprogram -matconvert_type aijmumps -ksp_type preonly -pc_type lu > > This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two. > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From Lars.Rindorf at teknologisk.dk Tue May 13 13:37:19 2008 From: Lars.Rindorf at teknologisk.dk (Lars Rindorf) Date: Tue, 13 May 2008 20:37:19 +0200 Subject: SV: PETSc and parallel direct solvers In-Reply-To: Message-ID: Dear Lisandro I was also suspecting that petsc was using the default lu factorization. And, in fact, petsc returns 'type: lu' instead of 'type mumps'. So you are right. I will try again later with a linux computer to compare umfpack and mumps. In the comparison between umfpack and mumps you send me (http://istanbul.be.itu.edu.tr/~huseyin/doc/frontal/node12.html) umfpack and mumps are almost equal in performance (they spell it 'mups'. Their reference on 'mups' is from 1989, maybe mups is a predecessor of mumps). If they are almost equal, then mumps is good enough for my purposes. Thanks. KR, Lars -----Oprindelig meddelelse----- Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] Sendt: 13. maj 2008 20:04 Til: Lars Rindorf Cc: petsc-users at mcs.anl.gov Emne: Re: PETSc and parallel direct solvers On 5/13/08, Lars Rindorf wrote: > Dear Lisandro > > I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance? Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!... To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not. Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS. As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff. I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods. Regards, > > Kind regards, Lars > > -----Oprindelig meddelelse----- > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] > Sendt: 13. maj 2008 18:54 > Til: Lars Rindorf > Emne: PETSc and parallel direct solvers > > > Dear lars, I saw you post to petsc-users, it bounced because you have > to suscribe to the list > > I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you? > > I usually do this to easy switch to use mumps. First, in the source > code, after assembling your matrix, add the following > > MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A); > > And then, when you actually run your program, add the following to the command line: > > $ mpiexec -n ./yourprogram -matconvert_type aijmumps -ksp_type > preonly -pc_type lu > > This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two. > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From dalcinl at gmail.com Tue May 13 18:54:07 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 13 May 2008 20:54:07 -0300 Subject: PETSc and parallel direct solvers In-Reply-To: References: Message-ID: On 5/13/08, Lars Rindorf wrote: > Dear Lisandro > > I was also suspecting that petsc was using the default lu factorization. And, in fact, petsc returns 'type: lu' instead of 'type mumps'. So you are right. I will try again later with a linux computer to compare umfpack and mumps. Indeed. Tell me your conclusions, I would love to know your results... > In the comparison between umfpack and mumps you send me (http://istanbul.be.itu.edu.tr/~huseyin/doc/frontal/node12.html) umfpack and mumps are almost equal in performance (they spell it 'mups'. Their reference on 'mups' is from 1989, maybe mups is a predecessor of mumps). If they are almost equal, then mumps is good enough for my purposes. > Well, the first authon in the 1989 reference seems to be the same that Patrick Amestoy here http://graal.ens-lyon.fr/MUMPS/index.php?page=credits. As i warned, the link is dated. Better to give a try yourself!.. Regards > Thanks. KR, Lars > > > > -----Oprindelig meddelelse----- > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] > > Sendt: 13. maj 2008 20:04 > Til: Lars Rindorf > Cc: petsc-users at mcs.anl.gov > Emne: Re: PETSc and parallel direct solvers > > > On 5/13/08, Lars Rindorf wrote: > > Dear Lisandro > > > > I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance? > > Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!... > > To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not. > > Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS. As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff. > > I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods. > > > Regards, > > > > > > Kind regards, Lars > > > > -----Oprindelig meddelelse----- > > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] > > Sendt: 13. maj 2008 18:54 > > Til: Lars Rindorf > > Emne: PETSc and parallel direct solvers > > > > > > Dear lars, I saw you post to petsc-users, it bounced because you have > > to suscribe to the list > > > > I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you? > > > > I usually do this to easy switch to use mumps. First, in the source > > code, after assembling your matrix, add the following > > > > MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A); > > > > And then, when you actually run your program, add the following to the command line: > > > > $ mpiexec -n ./yourprogram -matconvert_type aijmumps -ksp_type > > preonly -pc_type lu > > > > This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two. > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From Lars.Rindorf at teknologisk.dk Tue May 13 08:49:19 2008 From: Lars.Rindorf at teknologisk.dk (Lars Rindorf) Date: Tue, 13 May 2008 15:49:19 +0200 Subject: Parallel petsc with external UMFPACK Message-ID: Hi I'm thinking about using petsc to solve a linear system (Ax=b) using parallelization on a couple of linux computers. It is very important for my system (electromagnetics) to use a direct solver, such as UMFPACK. Iterative solvers perform very poorly. My question is: I can see that UMFPACK is not listed (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertab le.html) on the petsc page. Are there any plans to expand petsc to also include petsc? The UMFPACK homepage says that there exist a parallization for UMFPACK by Steve Hadfield. Can his parallelization be used with petsc? I must add, that although I like numerics and maths I do not intend to program 'from scratch'. Kind regards Lars Rindorf _____________________________ Lars Rindorf M.Sc., Ph.D. http://www.dti.dk Danish Technological Institute Gregersensvej 2630 Taastrup Denmark Phone +45 72 20 20 00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbostandoust at yahoo.com Tue May 13 23:15:47 2008 From: mbostandoust at yahoo.com (Mehdi Bostandoost) Date: Tue, 13 May 2008 21:15:47 -0700 (PDT) Subject: PETSc and parallel direct solvers In-Reply-To: Message-ID: <527863.5972.qm@web33507.mail.mud.yahoo.com> Hi In my master thesis,I needed to use PETSC direct solvers. Because of that I preparesd a short report. I attached the report to this email. note: the cluster that we had was not a good cluster and this report goes back to 4 years ago when I used petsc2.1.6. I thought it might be helpful. Regards Mehdi Lisandro Dalcin wrote: On 5/13/08, Lars Rindorf wrote: > Dear Lisandro > > I was also suspecting that petsc was using the default lu factorization. And, in fact, petsc returns 'type: lu' instead of 'type mumps'. So you are right. I will try again later with a linux computer to compare umfpack and mumps. Indeed. Tell me your conclusions, I would love to know your results... > In the comparison between umfpack and mumps you send me (http://istanbul.be.itu.edu.tr/~huseyin/doc/frontal/node12.html) umfpack and mumps are almost equal in performance (they spell it 'mups'. Their reference on 'mups' is from 1989, maybe mups is a predecessor of mumps). If they are almost equal, then mumps is good enough for my purposes. > Well, the first authon in the 1989 reference seems to be the same that Patrick Amestoy here http://graal.ens-lyon.fr/MUMPS/index.php?page=credits. As i warned, the link is dated. Better to give a try yourself!.. Regards > Thanks. KR, Lars > > > > -----Oprindelig meddelelse----- > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] > > Sendt: 13. maj 2008 20:04 > Til: Lars Rindorf > Cc: petsc-users at mcs.anl.gov > Emne: Re: PETSc and parallel direct solvers > > > On 5/13/08, Lars Rindorf wrote: > > Dear Lisandro > > > > I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance? > > Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!... > > To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not. > > Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS. As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff. > > I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods. > > > Regards, > > > > > > Kind regards, Lars > > > > -----Oprindelig meddelelse----- > > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] > > Sendt: 13. maj 2008 18:54 > > Til: Lars Rindorf > > Emne: PETSc and parallel direct solvers > > > > > > Dear lars, I saw you post to petsc-users, it bounced because you have > > to suscribe to the list > > > > I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you? > > > > I usually do this to easy switch to use mumps. First, in the source > > code, after assembling your matrix, add the following > > > > MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A); > > > > And then, when you actually run your program, add the following to the command line: > > > > $ mpiexec -n ./yourprogram -matconvert_type aijmumps -ksp_type > > preonly -pc_type lu > > > > This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two. > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Performance of PETSC direct solvers on the Beowulf Cluster.pdf Type: application/pdf Size: 127056 bytes Desc: 4129988398-Performance of PETSC direct solvers on the Beowulf Cluster.pdf URL: From knepley at gmail.com Wed May 14 06:58:42 2008 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 14 May 2008 06:58:42 -0500 Subject: Parallel petsc with external UMFPACK In-Reply-To: References: Message-ID: On Tue, May 13, 2008 at 8:49 AM, Lars Rindorf wrote: > > > Hi > > I'm thinking about using petsc to solve a linear system (Ax=b) using > parallelization on a couple of linux computers. It is very important for my > system (electromagnetics) to use a direct solver, such as UMFPACK. Iterative > solvers perform very poorly. > > My question is: I can see that UMFPACK is not listed > (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertable.html) > on the petsc page. Are there any plans to expand petsc to also include > petsc? > > The UMFPACK homepage says that there exist a parallization for UMFPACK by > Steve Hadfield. Can his parallelization be used with petsc? If you can find it, please send the URL. I looked this morning and could not locate this parallel UMFPACK on the web, which makes me very suspicious. We only like to wrap supported software, not one-off projects that continually break because no one is maintaining them. Matt > I must add, that although I like numerics and maths I do not intend to > program 'from scratch'. > > Kind regards > Lars Rindorf > > > > _____________________________ > > > Lars Rindorf > M.Sc., Ph.D. > > http://www.dti.dk > > Danish Technological Institute > Gregersensvej > > 2630 Taastrup > > Denmark > Phone +45 72 20 20 00 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From tsjb00 at hotmail.com Wed May 14 18:16:21 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Wed, 14 May 2008 23:16:21 +0000 Subject: question about VecLoad Message-ID: Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as: for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/ fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var); i++; idx=i; AOApplicationToPetsc(ao,1,&idx); VecSetValue(v0,idx,var,INSERT_VALUES); } then use VecView to output the binary file: PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer); VecView(v0,viewer); It's wrong. Please let me know how to fix it. Many thanks! JB _________________________________________________________________ ?????????live mail???????? http://get.live.cn/product/mail.html From tsjb00 at hotmail.com Wed May 14 18:39:42 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Wed, 14 May 2008 23:39:42 +0000 Subject: question about VecLoad (pls disregard previous one) Message-ID: Sorry the previous message is wrong Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as: for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/ fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var); i++; idx=i; AOApplicationToPetsc(ao,1,&idx); VecSetValue(v0,idx,var,INSERT_VALUES); } then use VecView to output the binary file: PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer); VecView(v0,viewer); Please let me know if something is wrong. Many thanks! JB _________________________________________________________________ MSN ???????????????????? http://cn.msn.com From tsjb00 at hotmail.com Wed May 14 19:46:38 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Thu, 15 May 2008 00:46:38 +0000 Subject: question about VecLoad (attached ) Message-ID: Sorry the message gets messed up again! Please check the attached text file of my questions. Sorry for the inconvenience! Many thanks in advance! JB ---------------------------------------- > From: tsjb00 at hotmail.com > To: petsc-users at mcs.anl.gov > Subject: question about VecLoad (pls disregard previous one) > Date: Wed, 14 May 2008 23:39:42 +0000 > > > Sorry the previous message is wrong > > Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as: > for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/ > fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var); > i++; > idx=i; > AOApplicationToPetsc(ao,1,&idx); > VecSetValue(v0,idx,var,INSERT_VALUES); > } > then use VecView to output the binary file: > PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer); > VecView(v0,viewer); > > Please let me know if something is wrong. > > Many thanks! > > JB > > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > _________________________________________________________________ ?????????????MSN????TA????? http://im.live.cn/emoticons/?ID=18 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: question.txt URL: From bsmith at mcs.anl.gov Wed May 14 20:28:51 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 May 2008 20:28:51 -0500 Subject: question about VecLoad (pls disregard previous one) In-Reply-To: References: Message-ID: Since PETSc is used from C or Fortran you are free to write any kind of code you want that reads in ASCII files anyway you want. As you have done it is good to save the vectors with a binary viewer because they are easy to read and write; but again you can write whatever code you want. As to whether you code is wrong, no one can say, just run it and test it. Barry On May 14, 2008, at 6:39 PM, tsjb00 wrote: > > Sorry the previous message is wrong > > Hi! I have a question about VecLoad. In my program, I need to read > in ordered data from an input file, which in an ordinary c program > would be as: > for (k=0; ky->z,ie > p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/ > fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var); > i++; > idx=i; > AOApplicationToPetsc(ao,1,&idx); > VecSetValue(v0,idx,var,INSERT_VALUES); > } > then use VecView to output the binary file: > > PetscViewerBinaryOpen > (PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer); > VecView(v0,viewer); > > Please let me know if something is wrong. > > Many thanks! > > JB > > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > > From rafaelsantoscoelho at gmail.com Wed May 14 22:30:01 2008 From: rafaelsantoscoelho at gmail.com (Rafael Santos Coelho) Date: Thu, 15 May 2008 00:30:01 -0300 Subject: Something weird with SNES convergence reason Message-ID: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> Hello everybody, I've coded a program which solves, in parallel, the three-dimensional Bratu problem. Afterwards, I've run tests in a cluster to see how it would go and, at first, it seemed ok to me, but then I've noticed that whenever I increased the number of processors (from 16 to 32, for example), the program started to diverge due to a failure in the Line Search Newton's Method. Here is what a monitoring function prints out: nonlinear iteration number = 1, norm(F(x)) = 1013.53, linear iterations = 16 nonlinear iteration number = 2, norm(F(x)) = 1013.33, linear iterations = 32 nonlinear iteration number = 3, norm(F(x)) = 1013.33, linear iterations = 48 Nonlinear solve did not converge due to DIVERGED_LS_FAILURE Indeed, one can see that the method is really diverging (for smaller tests, though, say N = 8 * 8 * 8, it converges). What's wrong here? Is it something with my code? If yes, how can I fix it? Best regards, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 15 07:07:46 2008 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 15 May 2008 07:07:46 -0500 Subject: question about VecLoad (attached ) In-Reply-To: References: Message-ID: You do not have to use the AO. We always save to file in the natural ordering. Without this, we would not be able to load on different numbers of processors. Matt 2008/5/14 tsjb00 : > > Sorry the message gets messed up again! Please check the attached text file of my questions. > > Sorry for the inconvenience! Many thanks in advance! > > JB > ---------------------------------------- >> From: tsjb00 at hotmail.com >> To: petsc-users at mcs.anl.gov >> Subject: question about VecLoad (pls disregard previous one) >> Date: Wed, 14 May 2008 23:39:42 +0000 >> >> >> Sorry the previous message is wrong >> >> Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as: >> for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/ >> fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var); >> i++; >> idx=i; >> AOApplicationToPetsc(ao,1,&idx); >> VecSetValue(v0,idx,var,INSERT_VALUES); >> } >> then use VecView to output the binary file: >> PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer); >> VecView(v0,viewer); >> >> Please let me know if something is wrong. >> >> Many thanks! >> >> JB >> >> _________________________________________________________________ >> MSN ???????????????????? >> http://cn.msn.com >> > > _________________________________________________________________ > ?????????????MSN????TA????? > http://im.live.cn/emoticons/?ID=18 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From knepley at gmail.com Thu May 15 07:28:40 2008 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 15 May 2008 07:28:40 -0500 Subject: Something weird with SNES convergence reason In-Reply-To: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> Message-ID: 1) Are the linear systems really being solved in Newton? 2) What is the Bratu parameter? Turn it off and see that you get convergence in 1 iteration. Matt On Wed, May 14, 2008 at 10:30 PM, Rafael Santos Coelho wrote: > Hello everybody, > > I've coded a program which solves, in parallel, the three-dimensional Bratu > problem. Afterwards, I've run tests in a cluster to see how it would go and, > at first, it seemed ok to me, but then I've noticed that whenever I > increased the number of processors (from 16 to 32, for example), the program > started to diverge due to a failure in the Line Search Newton's Method. Here > is what a monitoring function prints out: > > nonlinear iteration number = 1, norm(F(x)) = 1013.53, linear iterations = > 16 > nonlinear iteration number = 2, norm(F(x)) = 1013.33, linear iterations = > 32 > nonlinear iteration number = 3, norm(F(x)) = 1013.33, linear iterations = > 48 > Nonlinear solve did not converge due to DIVERGED_LS_FAILURE > > Indeed, one can see that the method is really diverging (for smaller tests, > though, say N = 8 * 8 * 8, it converges). > > What's wrong here? Is it something with my code? If yes, how can I fix it? > > Best regards, > > Rafael > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From bsmith at mcs.anl.gov Thu May 15 09:04:52 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 May 2008 09:04:52 -0500 Subject: Something weird with SNES convergence reason In-Reply-To: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> Message-ID: <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov> run with -ksp_monitor -ksp_converged_reason to see how the linear solver is working. Also try adding -ksp_rtol 1.e-10 to see if solving the linear system more accurately helps (it really shouldn't matter). You can also run with -info to get a lot more detailed information about the nonlinear solve and the line search. This is suppose to be an easy nonlinear problem so I would expect it to converge easily, there may be a slight error with your FormFunction() that starts to matter only for larger problems. Barry On May 14, 2008, at 10:30 PM, Rafael Santos Coelho wrote: > Hello everybody, > > I've coded a program which solves, in parallel, the three- > dimensional Bratu problem. Afterwards, I've run tests in a cluster > to see how it would go and, at first, it seemed ok to me, but then > I've noticed that whenever I increased the number of processors > (from 16 to 32, for example), the program started to diverge due to > a failure in the Line Search Newton's Method. Here is what a > monitoring function prints out: > > nonlinear iteration number = 1, norm(F(x)) = 1013.53, linear > iterations = 16 > nonlinear iteration number = 2, norm(F(x)) = 1013.33, linear > iterations = 32 > nonlinear iteration number = 3, norm(F(x)) = 1013.33, linear > iterations = 48 > Nonlinear solve did not converge due to DIVERGED_LS_FAILURE > > Indeed, one can see that the method is really diverging (for smaller > tests, though, say N = 8 * 8 * 8, it converges). > > What's wrong here? Is it something with my code? If yes, how can I > fix it? > > Best regards, > > Rafael > From tsjb00 at hotmail.com Thu May 15 10:29:03 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Thu, 15 May 2008 15:29:03 +0000 Subject: question about VecLoad (pls disregard previous one) In-Reply-To: References: Message-ID: Many thanks to all for the reply! I use the code: PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer); VecLoad(viewer,VECMPI,&v1); to get the vector storing values of var. My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that? Thanks in advance! JB _________________________________________________________________ MSN ???????????????????? http://cn.msn.com From knepley at gmail.com Thu May 15 10:45:24 2008 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 15 May 2008 10:45:24 -0500 Subject: question about VecLoad (pls disregard previous one) In-Reply-To: References: Message-ID: 2008/5/15 tsjb00 : > > Many thanks to all for the reply! > > I use the code: > PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer); > VecLoad(viewer,VECMPI,&v1); > to get the vector storing values of var. > > My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that? The output vector, using VecView(), is in natural order. It has no idea about distribution over processes. When it is read in, using VecLoad(), it is redistributed according to the current partition. Matt > Thanks in advance! > > JB > > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From pflath at ices.utexas.edu Thu May 15 11:04:28 2008 From: pflath at ices.utexas.edu (Pearl Flath) Date: Thu, 15 May 2008 11:04:28 -0500 Subject: accessing iterative vectors in CG Message-ID: Hi, I'd like to do some additional calculations for another purpose with the iterative vectors in each step of the KSP CG solve. How do I get access to them? Pearl Flath -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 15 11:10:42 2008 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 15 May 2008 11:10:42 -0500 Subject: accessing iterative vectors in CG In-Reply-To: References: Message-ID: On Thu, May 15, 2008 at 11:04 AM, Pearl Flath wrote: > Hi, > I'd like to do some additional calculations for another purpose with the > iterative vectors in each step of the KSP CG solve. How do I get access to > them? We do not expose those vectors to the user in the interface. I think the easiest thing to do is copy the cg.c code into another solver, mycg.c and do the requisite calculations in that one. You register the solver in the same way as CG is registered in src/ksp/ksp/interface/itregis.c with a call to KSPRegisterDynamic(). Then from the command-line you can use -ksp_type mycg. Matt > Pearl Flath -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From dalcinl at gmail.com Thu May 15 11:38:19 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 15 May 2008 13:38:19 -0300 Subject: question about VecLoad (pls disregard previous one) In-Reply-To: References: Message-ID: Have you give a look to VecLoadIntoVector() ? For use it, you have to previously create the vector with the desired distribution (or get it from DA's), and then the values will be read and used to fill your vector... Of couse, you have to get shure that the ordering of indices is the same (and I believe it should be the 'natural' DA ordering), this is specially important if you run your problem with different number of processes. On 5/15/08, tsjb00 wrote: > > Many thanks to all for the reply! > > I use the code: > PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer); > VecLoad(viewer,VECMPI,&v1); > to get the vector storing values of var. > > My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that? > > Thanks in advance! > > > JB > > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From bsmith at mcs.anl.gov Thu May 15 11:43:08 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 May 2008 11:43:08 -0500 Subject: accessing iterative vectors in CG In-Reply-To: References: Message-ID: <006BE129-08AE-4A54-88E8-253EA69BA658@mcs.anl.gov> On May 15, 2008, at 11:10 AM, Matthew Knepley wrote: > On Thu, May 15, 2008 at 11:04 AM, Pearl Flath > wrote: >> Hi, >> I'd like to do some additional calculations for another purpose >> with the >> iterative vectors in each step of the KSP CG solve. How do I get >> access to >> them? > > We do not expose those vectors to the user in the interface. I think > the easiest > thing to do is copy the cg.c code into another solver, mycg.c and do > the requisite > calculations in that one. > > You register the solver in the same way as CG is registered in > src/ksp/ksp/interface/itregis.c > with a call to KSPRegisterDynamic(). Then from the command-line you > can use -ksp_type mycg. > > Matt > If you merely want to access the current solution, residual at each iteration when the convergence test or monitoring is done you could just provide your own monitor or convergence test. Barry >> Pearl Flath > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > From tsjb00 at hotmail.com Thu May 15 12:09:37 2008 From: tsjb00 at hotmail.com (tsjb00) Date: Thu, 15 May 2008 17:09:37 +0000 Subject: question about VecLoad (pls disregard previous one) In-Reply-To: References: Message-ID: Many thanks for the reply! This seems to work. ---------------------------------------- > Date: Thu, 15 May 2008 13:38:19 -0300 > From: dalcinl at gmail.com > To: petsc-users at mcs.anl.gov > Subject: Re: question about VecLoad (pls disregard previous one) > > Have you give a look to VecLoadIntoVector() ? For use it, you have to > previously create the vector with the desired distribution (or get it > from DA's), and then the values will be read and used to fill your > vector... Of couse, you have to get shure that the ordering of indices > is the same (and I believe it should be the 'natural' DA ordering), > this is specially important if you run your problem with different > number of processes. > > On 5/15/08, tsjb00 wrote: >> >> Many thanks to all for the reply! >> >> I use the code: >> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer); >> VecLoad(viewer,VECMPI,&v1); >> to get the vector storing values of var. >> >> My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that? >> >> Thanks in advance! >> >> >> JB >> >> _________________________________________________________________ >> MSN ???????????????????? >> http://cn.msn.com >> >> > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > _________________________________________________________________ ?????????????MSN????TA????? http://im.live.cn/emoticons/?ID=18 From rafaelsantoscoelho at gmail.com Thu May 15 21:21:56 2008 From: rafaelsantoscoelho at gmail.com (Rafael Santos Coelho) Date: Thu, 15 May 2008 23:21:56 -0300 Subject: Something weird with SNES convergence reason In-Reply-To: <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov> References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov> Message-ID: <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com> Hi people, thank you very much for the help. I couldn't fix the problem though... Matthew: 1) I guess so because for the vast majority of the tests carried out, the method converges and you can actually observe norm(F(x)) decreasing with few Newton iterations. For example: $ mpirun -np 2 ./bratu_problem -N 16 -M 16 -P 16 -ksp_converged_reason -snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor 0 KSP Residual norm 5.960419091967e-01 1 KSP Residual norm 1.235318806330e+00 (...) Linear solve converged due to CONVERGED_RTOL iterations 56 0 KSP Residual norm 2.990541631546e-02 1 KSP Residual norm 1.332572441021e-02 (...) 29 KSP Residual norm 3.225505605549e-07 30 KSP Residual norm 1.658059885118e-07 Linear solve converged due to CONVERGED_RTOL iterations 30 0 KSP Residual norm 7.629434752036e-05 1 KSP Residual norm 1.413056976255e-05 (...) 21 KSP Residual norm 1.183900079277e-09 22 KSP Residual norm 6.010910804534e-10 Linear solve converged due to CONVERGED_RTOL iterations 22 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE 2) The governing PDE is -Laplacian(u) + d * u_x -lambda * exp(u) = 0 and u = 0 in all domain boundaries. u_x stands for the partial derivative of u with respect to the x variable, where u = u(x, y, z). For all the tests I made, d = 16 and lambda = 32. I tried setting d = 0, but the error continued. Barry: Consider $ mpirun -np 8 ./bratu_problem -x 16 -y 16 -z 16 -ksp_converged_reason -snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor Here's the output: 0 KSP Residual norm 5.960419091967e-01 1 KSP Residual norm 1.235318806330e+00 (...) 55 KSP Residual norm 7.533575286046e-06 56 KSP Residual norm 4.924747432423e-06 Linear solve converged due to CONVERGED_RTOL iterations 56 0 KSP Residual norm 5.899667305071e-01 1 KSP Residual norm 1.233037780509e+00 (...) 56 KSP Residual norm 9.299650766487e-06 57 KSP Residual norm 5.541388445894e-06 Linear solve converged due to CONVERGED_RTOL iterations 57 0 KSP Residual norm 5.898541843665e-01 1 KSP Residual norm 1.230515227262e+00 (...) 57 KSP Residual norm 6.065473514455e-06 58 KSP Residual norm 3.255910272791e-06 Linear solve converged due to CONVERGED_RTOL iterations 58 Nonlinear solve did not converge due to DIVERGED_LS_FAILURE Now, if I use -ksp_rtol 1.e-10, same thing occurs, the only difference is that the number of linear iterations per nonlinear iteration gets bigger (as one might have expected). I'm using the classic 7-point stencil finite difference approximation to discretize the PDE... -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 16 07:46:47 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 16 May 2008 07:46:47 -0500 Subject: Something weird with SNES convergence reason In-Reply-To: <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com> References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov> <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com> Message-ID: DIVERGED_LS_FAILURE means that the direction computed by solving - J(u^n)^{-1} F(u^n) is NOT a descent direction that is F(u^n - lambda * J(u^n)^{-1} F(u^n)) is not smaller than F(u^n) for 0 < lambda < 1. Since you are solving the linear system (in your last test) very accurately this almost always indicates the Jacobian is wrong. Please run with -snes_type test and see what it says. Also recheck your Jacobian code. If this does not help then take a look at src/snes/examples/tuturials/ex5.c and see how it can be run with fd_jacobian; you can try that with you code to track down any errors in the Jacobian. Barry On May 15, 2008, at 9:21 PM, Rafael Santos Coelho wrote: > Hi people, > > thank you very much for the help. I couldn't fix the problem though... > > Matthew: > > 1) I guess so because for the vast majority of the tests carried > out, the method converges and you can actually observe norm(F(x)) > decreasing with few Newton iterations. For example: > > $ mpirun -np 2 ./bratu_problem -N 16 -M 16 -P 16 - > ksp_converged_reason -snes_converged_reason -ksp_type lcd -pc_type > jacobi -ksp_monitor > > 0 KSP Residual norm 5.960419091967e-01 > 1 KSP Residual norm 1.235318806330e+00 > (...) > Linear solve converged due to CONVERGED_RTOL iterations 56 > 0 KSP Residual norm 2.990541631546e-02 > 1 KSP Residual norm 1.332572441021e-02 > (...) > 29 KSP Residual norm 3.225505605549e-07 > 30 KSP Residual norm 1.658059885118e-07 > Linear solve converged due to CONVERGED_RTOL iterations 30 > 0 KSP Residual norm 7.629434752036e-05 > 1 KSP Residual norm 1.413056976255e-05 > (...) > 21 KSP Residual norm 1.183900079277e-09 > 22 KSP Residual norm 6.010910804534e-10 > > Linear solve converged due to CONVERGED_RTOL iterations 22 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > 2) The governing PDE is -Laplacian(u) + d * u_x -lambda * exp(u) = > 0 and u = 0 in all domain boundaries. u_x stands for the partial > derivative of u with respect to the x variable, where u = u(x, y, > z). For all the tests I made, d = 16 and lambda = 32. I tried > setting d = 0, but the error continued. > > Barry: > > Consider > > $ mpirun -np 8 ./bratu_problem -x 16 -y 16 -z 16 - > ksp_converged_reason -snes_converged_reason -ksp_type lcd -pc_type > jacobi -ksp_monitor > > Here's the output: > > 0 KSP Residual norm 5.960419091967e-01 > 1 KSP Residual norm 1.235318806330e+00 > (...) > 55 KSP Residual norm 7.533575286046e-06 > 56 KSP Residual norm 4.924747432423e-06 > Linear solve converged due to CONVERGED_RTOL iterations 56 > 0 KSP Residual norm 5.899667305071e-01 > 1 KSP Residual norm 1.233037780509e+00 > (...) > 56 KSP Residual norm 9.299650766487e-06 > 57 KSP Residual norm 5.541388445894e-06 > Linear solve converged due to CONVERGED_RTOL iterations 57 > 0 KSP Residual norm 5.898541843665e-01 > 1 KSP Residual norm 1.230515227262e+00 > (...) > 57 KSP Residual norm 6.065473514455e-06 > 58 KSP Residual norm 3.255910272791e-06 > > Linear solve converged due to CONVERGED_RTOL iterations 58 > Nonlinear solve did not converge due to DIVERGED_LS_FAILURE > > Now, if I use -ksp_rtol 1.e-10, same thing occurs, the only > difference is that the number of linear iterations per nonlinear > iteration gets bigger (as one might have expected). > > I'm using the classic 7-point stencil finite difference > approximation to discretize the PDE... > From knepley at gmail.com Fri May 16 09:42:50 2008 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 16 May 2008 09:42:50 -0500 Subject: Something weird with SNES convergence reason In-Reply-To: <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com> References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov> <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com> Message-ID: On Thu, May 15, 2008 at 9:21 PM, Rafael Santos Coelho wrote: > Hi people, > > thank you very much for the help. I couldn't fix the problem though... > > Matthew: > > 1) I guess so because for the vast majority of the tests carried out, the > method converges and you can actually observe norm(F(x)) decreasing with few > Newton iterations. For example: > > $ mpirun -np 2 ./bratu_problem -N 16 -M 16 -P 16 -ksp_converged_reason > -snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor > > 0 KSP Residual norm 5.960419091967e-01 > 1 KSP Residual norm 1.235318806330e+00 > (...) > Linear solve converged due to CONVERGED_RTOL iterations 56 > 0 KSP Residual norm 2.990541631546e-02 > 1 KSP Residual norm 1.332572441021e-02 > (...) > 29 KSP Residual norm 3.225505605549e-07 > 30 KSP Residual norm 1.658059885118e-07 > Linear solve converged due to CONVERGED_RTOL iterations 30 > 0 KSP Residual norm 7.629434752036e-05 > 1 KSP Residual norm 1.413056976255e-05 > (...) > 21 KSP Residual norm 1.183900079277e-09 > 22 KSP Residual norm 6.010910804534e-10 > > Linear solve converged due to CONVERGED_RTOL iterations 22 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > 2) The governing PDE is -Laplacian(u) + d * u_x -lambda * exp(u) = 0 and u > = 0 in all domain boundaries. u_x stands for the partial derivative of u > with respect to the x variable, where u = u(x, y, z). For all the tests I > made, d = 16 and lambda = 32. I tried setting d = 0, but the error > continued. There is a real problem with d past the birfurcation point. Make d < 6 and run again. Also, your code is wrong if d == 0 is a problem. Matt > Barry: > > Consider > > $ mpirun -np 8 ./bratu_problem -x 16 -y 16 -z 16 -ksp_converged_reason > -snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor > > Here's the output: > > 0 KSP Residual norm 5.960419091967e-01 > 1 KSP Residual norm 1.235318806330e+00 > (...) > 55 KSP Residual norm 7.533575286046e-06 > 56 KSP Residual norm 4.924747432423e-06 > Linear solve converged due to CONVERGED_RTOL iterations 56 > 0 KSP Residual norm 5.899667305071e-01 > 1 KSP Residual norm 1.233037780509e+00 > (...) > 56 KSP Residual norm 9.299650766487e-06 > 57 KSP Residual norm 5.541388445894e-06 > Linear solve converged due to CONVERGED_RTOL iterations 57 > 0 KSP Residual norm 5.898541843665e-01 > 1 KSP Residual norm 1.230515227262e+00 > (...) > 57 KSP Residual norm 6.065473514455e-06 > 58 KSP Residual norm 3.255910272791e-06 > > Linear solve converged due to CONVERGED_RTOL iterations 58 > Nonlinear solve did not converge due to DIVERGED_LS_FAILURE > > Now, if I use -ksp_rtol 1.e-10, same thing occurs, the only difference is > that the number of linear iterations per nonlinear iteration gets bigger (as > one might have expected). > > I'm using the classic 7-point stencil finite difference approximation to > discretize the PDE... -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From gdiso at ustc.edu Fri May 16 10:47:12 2008 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 16 May 2008 23:47:12 +0800 Subject: mesh ordering and partition Message-ID: Hi, I am studying the parallel programming, using libmesh/petsc as an excellent example. I have some questions about the partition and mesh ordering. It seems libmesh does not reorder the mesh nodes. It only calls metis to partition the mesh, and uses original node order to build the matrix. I wonder if a bad mesh ordering may cause low efficency of ILU preconditioner. However, If I did RCM ordering to the mesh, the node's order may conflict with contiguous index set int the subdomain partitioned by metis. How should I balance the ordering (to reduce filling) and partition (to reduce communication)? any good ideas? Regards, Gong Ding From knepley at gmail.com Fri May 16 11:19:57 2008 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 16 May 2008 11:19:57 -0500 Subject: mesh ordering and partition In-Reply-To: References: Message-ID: On Fri, May 16, 2008 at 10:47 AM, Gong Ding wrote: > Hi, > I am studying the parallel programming, using libmesh/petsc as an excellent > example. > > I have some questions about the partition and mesh ordering. > > > > It seems libmesh does not reorder the mesh nodes. It only calls metis to > partition the mesh, and > > uses original node order to build the matrix. I wonder if a bad mesh > ordering may cause low efficency > > of ILU preconditioner. However, If I did RCM ordering to the mesh, the > node's order may conflict with > > contiguous index set int the subdomain partitioned by metis. How should I > balance the ordering (to reduce filling) > > and partition (to reduce communication)? any good ideas? I think, if you are using the serial PETSc ILU, you should just use a MatOrdering, which can be done from the command line: -pc_type ilu -pc_factor_mat_ordering_type rcm which I tested on KSP ex2. Matt > Regards, > > Gong Ding > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From stephane.aubert at fluorem.com Fri May 16 11:22:18 2008 From: stephane.aubert at fluorem.com (Stephane Aubert) Date: Fri, 16 May 2008 18:22:18 +0200 Subject: GMRES left-preconditioned with ILU1 versus ILU0 Message-ID: <482DB4BA.70406@fluorem.com> Hi, I tried to improve the convergence of a rather badly conditioned linear system by increasing the ILU(k) level from k=0 to k=1. And, whereas I got a convergence for k=0, I ended up with an explosive divergence for k=1!!! The PETCS version is 2.3.2-p8. The matrix type is MPIBAIJ (block size=7,mesh points=23010,non-empty blocks=297454) The common command line options are: * KSP="-ksp_type gmres -ksp_max_it 800 -ksp_gmres_restart 800 -ksp_rtol 1.0e-12 -ksp_left_pc -ksp_gmres_modifiedgramschmidt -ksp_gmres_cgs_refinement_type REFINE_NEVER -ksp_singmonitor -ksp_compute_eigenvalues": I'm forcing 800 krylovs without restart to get the condition number of the pre-conditioned system (if I understand correctly the man page of -ksp_singmonitor) * PC="-pc_type asm -pc_asm_overlap 2": I'm planning to run with more than 1 partition, but for the time being, only one partition is used. * BLK_KSP="-sub_ksp_type preonly": Because of GMRES+ILU For ILU(0), I'm using: * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.00 -sub_pc_factor_shift_nonzero -sub_pc_factor_mat_ordering_type rcm -sub_pc_factor_pivot_in_blocks" and I got for convergence: 0 KSP Residual norm 1.687258996558e+00 % max 1 min 1 max/min 1 1 KSP Residual norm 1.687132576728e+00 % max 67.8829 min 67.8829 max/min 1 2 KSP Residual norm 1.685760733293e+00 % max 3496.78 min 19.5582 max/min 178.789 3 KSP Residual norm 1.668552043995e+00 % max 3604.26 min 12.0073 max/min 300.174 4 KSP Residual norm 1.578511835381e+00 % max 3639.92 min 7.35118 max/min 495.148 .... 795 KSP Residual norm 1.465607932165e-09 % max 18209.9 min 0.00612973 max/min 2.97075e+06 796 KSP Residual norm 1.390602265424e-09 % max 18227.3 min 0.00612913 max/min 2.97388e+06 797 KSP Residual norm 1.320529491862e-09 % max 18231.9 min 0.0061286 max/min 2.97489e+06 798 KSP Residual norm 1.253371917713e-09 % max 18234.8 min 0.00612856 max/min 2.97538e+06 799 KSP Residual norm 1.188955299647e-09 % max 18278.5 min 0.00612594 max/min 2.98378e+06 800 KSP Residual norm 1.118294486519e-09 % max 18278.5 min 0.00612475 max/min 2.98437e+06 and the iterative solution compares very well with the one computed using complete LU factorization. For ILU(1), I'm using: * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 1 -sub_pc_factor_fill 3.81 -sub_pc_factor_shift_nonzero -sub_pc_factor_mat_ordering_type rcm -sub_pc_factor_pivot_in_blocks": RCM gives the smallest fill value. and I got for "convergence": 0 KSP Residual norm 7.095990612421e+126 % max 1 min 1 max/min 1 1 KSP Residual norm 3.313547979190e+123 % max 1.68012e+135 min 1.68012e+135 max/min 1 2 KSP Residual norm 1.257750994639e+119 % max 6.34518e+135 min 3.55953e+131 max/min 17825.9 3 KSP Residual norm 5.233083258710e+118 % max 1.25538e+136 min 1.42732e+127 max/min 8.79538e+08 4 KSP Residual norm 1.938981257595e+118 % max 1.44472e+136 min 4.82369e+125 max/min 2.99506e+10 5 KSP Residual norm 3.371270926617e+116 % max 1.45841e+136 min 1.79839e+125 max/min 8.1095e+10 6 KSP Residual norm 2.179293254483e+115 % max 1.45842e+136 min 5.16422e+122 max/min 2.82408e+13 7 KSP Residual norm 2.120598006522e+115 % max 1.46024e+136 min 3.67337e+122 max/min 3.97521e+13 8 KSP Residual norm 1.486820601733e+115 % max 1.461e+136 min 1.02249e+122 max/min 1.42886e+14 9 KSP Residual norm 7.653834441859e+114 % max 1.46138e+136 min 4.93314e+121 max/min 2.96237e+14 10 KSP Residual norm 5.001204920218e+114 % max 1.47243e+136 min 4.89675e+121 max/min 3.00694e+14 My guess is that the ILU(1) is singular (zero as diagonal elements?), but I thought that the options "-sub_pc_factor_shift_nonzero -sub_pc_factor_pivot_in_blocks" were taking care of that... I got lost in the source files trying to find out who at the end is computing and applying the ILU for MPIBAIJ format (replaced by SEQBAIJ with only one partition, I'm guessing). The question is: What am I doing wrong? I never heard that ILU(1) was worst than ILU(0)! Stef. -- ___________________________________________________________ Dr. Stephane AUBERT, CEO & CTO FLUOREM s.a.s Centre Scientifique Auguste MOIROUX 64 chemin des MOUILLES F-69130 ECULLY, FRANCE International: fax: +33 4.78.33.99.39 tel: +33 4.78.33.99.35 France: fax: 04.78.33.99.39 tel: 04.78.33.99.35 email: stephane.aubert at fluorem.com web: www.fluorem.com From knepley at gmail.com Fri May 16 11:34:43 2008 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 16 May 2008 11:34:43 -0500 Subject: GMRES left-preconditioned with ILU1 versus ILU0 In-Reply-To: <482DB4BA.70406@fluorem.com> References: <482DB4BA.70406@fluorem.com> Message-ID: On Fri, May 16, 2008 at 11:22 AM, Stephane Aubert wrote: > Hi, > I tried to improve the convergence of a rather badly conditioned linear > system by increasing the ILU(k) level from k=0 to k=1. > And, whereas I got a convergence for k=0, I ended up with an explosive > divergence for k=1!!! > The PETCS version is 2.3.2-p8. > The matrix type is MPIBAIJ (block size=7,mesh points=23010,non-empty > blocks=297454) > The common command line options are: > > * KSP="-ksp_type gmres -ksp_max_it 800 -ksp_gmres_restart 800 > -ksp_rtol 1.0e-12 -ksp_left_pc -ksp_gmres_modifiedgramschmidt > -ksp_gmres_cgs_refinement_type REFINE_NEVER -ksp_singmonitor > -ksp_compute_eigenvalues": I'm forcing 800 krylovs without restart > to get the condition number of the pre-conditioned system (if I > understand correctly the man page of -ksp_singmonitor) > * PC="-pc_type asm -pc_asm_overlap 2": I'm planning to run with more > than 1 partition, but for the time being, only one partition is used. > * BLK_KSP="-sub_ksp_type preonly": Because of GMRES+ILU > > For ILU(0), I'm using: > > * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 0 > -sub_pc_factor_fill 1.00 -sub_pc_factor_shift_nonzero > -sub_pc_factor_mat_ordering_type rcm -sub_pc_factor_pivot_in_blocks" > > and I got for convergence: > 0 KSP Residual norm 1.687258996558e+00 % max 1 min 1 max/min 1 > 1 KSP Residual norm 1.687132576728e+00 % max 67.8829 min 67.8829 max/min 1 > 2 KSP Residual norm 1.685760733293e+00 % max 3496.78 min 19.5582 max/min > 178.789 > 3 KSP Residual norm 1.668552043995e+00 % max 3604.26 min 12.0073 max/min > 300.174 > 4 KSP Residual norm 1.578511835381e+00 % max 3639.92 min 7.35118 max/min > 495.148 > .... > 795 KSP Residual norm 1.465607932165e-09 % max 18209.9 min 0.00612973 > max/min 2.97075e+06 > 796 KSP Residual norm 1.390602265424e-09 % max 18227.3 min 0.00612913 > max/min 2.97388e+06 > 797 KSP Residual norm 1.320529491862e-09 % max 18231.9 min 0.0061286 max/min > 2.97489e+06 > 798 KSP Residual norm 1.253371917713e-09 % max 18234.8 min 0.00612856 > max/min 2.97538e+06 > 799 KSP Residual norm 1.188955299647e-09 % max 18278.5 min 0.00612594 > max/min 2.98378e+06 > 800 KSP Residual norm 1.118294486519e-09 % max 18278.5 min 0.00612475 > max/min 2.98437e+06 > > and the iterative solution compares very well with the one computed using > complete LU factorization. > > For ILU(1), I'm using: > > * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 1 > -sub_pc_factor_fill 3.81 -sub_pc_factor_shift_nonzero > -sub_pc_factor_mat_ordering_type rcm > -sub_pc_factor_pivot_in_blocks": RCM gives the smallest fill value. > > and I got for "convergence": > 0 KSP Residual norm 7.095990612421e+126 % max 1 min 1 max/min 1 > 1 KSP Residual norm 3.313547979190e+123 % max 1.68012e+135 min 1.68012e+135 > max/min 1 > 2 KSP Residual norm 1.257750994639e+119 % max 6.34518e+135 min 3.55953e+131 > max/min 17825.9 > 3 KSP Residual norm 5.233083258710e+118 % max 1.25538e+136 min 1.42732e+127 > max/min 8.79538e+08 > 4 KSP Residual norm 1.938981257595e+118 % max 1.44472e+136 min 4.82369e+125 > max/min 2.99506e+10 > 5 KSP Residual norm 3.371270926617e+116 % max 1.45841e+136 min 1.79839e+125 > max/min 8.1095e+10 > 6 KSP Residual norm 2.179293254483e+115 % max 1.45842e+136 min 5.16422e+122 > max/min 2.82408e+13 > 7 KSP Residual norm 2.120598006522e+115 % max 1.46024e+136 min 3.67337e+122 > max/min 3.97521e+13 > 8 KSP Residual norm 1.486820601733e+115 % max 1.461e+136 min 1.02249e+122 > max/min 1.42886e+14 > 9 KSP Residual norm 7.653834441859e+114 % max 1.46138e+136 min 4.93314e+121 > max/min 2.96237e+14 > 10 KSP Residual norm 5.001204920218e+114 % max 1.47243e+136 min 4.89675e+121 > max/min 3.00694e+14 > > My guess is that the ILU(1) is singular (zero as diagonal elements?), but I > thought that the options "-sub_pc_factor_shift_nonzero > -sub_pc_factor_pivot_in_blocks" were taking care of that... I got lost in > the source files trying to find out who at the end is computing and applying > the ILU for MPIBAIJ format (replaced by SEQBAIJ with only one partition, I'm > guessing). > > The question is: What am I doing wrong? I never heard that ILU(1) was worst > than ILU(0)! It is not uncommon. There are no theoretical guarantees for ILU(k), which is why I dislike it so much. ILU(1) can indeed be worse than ILU(0), depending on the type of matrix you have and the iterative solver used. Matt > Stef. > > -- > ___________________________________________________________ > Dr. Stephane AUBERT, CEO & CTO > FLUOREM s.a.s > Centre Scientifique Auguste MOIROUX > 64 chemin des MOUILLES > F-69130 ECULLY, FRANCE > International: fax: +33 4.78.33.99.39 tel: +33 4.78.33.99.35 > France: fax: 04.78.33.99.39 tel: 04.78.33.99.35 > email: stephane.aubert at fluorem.com > web: www.fluorem.com > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From bsmith at mcs.anl.gov Fri May 16 11:42:47 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 16 May 2008 11:42:47 -0500 Subject: GMRES left-preconditioned with ILU1 versus ILU0 In-Reply-To: <482DB4BA.70406@fluorem.com> References: <482DB4BA.70406@fluorem.com> Message-ID: When monkeying with ILU it is always useful to run tests with - ksp_monitor_true_residual, because bad pivots in ILU can produce wildly scaled preconditioners so the preconditioned residual can be at a different scale then the actual residual. The block versions of the factorizations don't support the pc_factor_shiftXXXX options. You could try to add it. That is one nasty matrix, I don't know if you'll ever get reasonable performance with ILU. You may want to stick to direct solvers on each process (it may be slow and use lots of memory but at least it might work). Barry On May 16, 2008, at 11:22 AM, Stephane Aubert wrote: > Hi, > I tried to improve the convergence of a rather badly conditioned > linear system by increasing the ILU(k) level from k=0 to k=1. > And, whereas I got a convergence for k=0, I ended up with an > explosive divergence for k=1!!! > The PETCS version is 2.3.2-p8. > The matrix type is MPIBAIJ (block size=7,mesh points=23010,non-empty > blocks=297454) > The common command line options are: > > * KSP="-ksp_type gmres -ksp_max_it 800 -ksp_gmres_restart 800 > -ksp_rtol 1.0e-12 -ksp_left_pc -ksp_gmres_modifiedgramschmidt > -ksp_gmres_cgs_refinement_type REFINE_NEVER -ksp_singmonitor > -ksp_compute_eigenvalues": I'm forcing 800 krylovs without restart > to get the condition number of the pre-conditioned system (if I > understand correctly the man page of -ksp_singmonitor) > * PC="-pc_type asm -pc_asm_overlap 2": I'm planning to run with more > than 1 partition, but for the time being, only one partition is > used. > * BLK_KSP="-sub_ksp_type preonly": Because of GMRES+ILU > > For ILU(0), I'm using: > > * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 0 > -sub_pc_factor_fill 1.00 -sub_pc_factor_shift_nonzero > -sub_pc_factor_mat_ordering_type rcm - > sub_pc_factor_pivot_in_blocks" > > and I got for convergence: > 0 KSP Residual norm 1.687258996558e+00 % max 1 min 1 max/min 1 > 1 KSP Residual norm 1.687132576728e+00 % max 67.8829 min 67.8829 max/ > min 1 > 2 KSP Residual norm 1.685760733293e+00 % max 3496.78 min 19.5582 max/ > min 178.789 > 3 KSP Residual norm 1.668552043995e+00 % max 3604.26 min 12.0073 max/ > min 300.174 > 4 KSP Residual norm 1.578511835381e+00 % max 3639.92 min 7.35118 max/ > min 495.148 > .... > 795 KSP Residual norm 1.465607932165e-09 % max 18209.9 min > 0.00612973 max/min 2.97075e+06 > 796 KSP Residual norm 1.390602265424e-09 % max 18227.3 min > 0.00612913 max/min 2.97388e+06 > 797 KSP Residual norm 1.320529491862e-09 % max 18231.9 min 0.0061286 > max/min 2.97489e+06 > 798 KSP Residual norm 1.253371917713e-09 % max 18234.8 min > 0.00612856 max/min 2.97538e+06 > 799 KSP Residual norm 1.188955299647e-09 % max 18278.5 min > 0.00612594 max/min 2.98378e+06 > 800 KSP Residual norm 1.118294486519e-09 % max 18278.5 min > 0.00612475 max/min 2.98437e+06 > > and the iterative solution compares very well with the one computed > using complete LU factorization. > > For ILU(1), I'm using: > > * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 1 > -sub_pc_factor_fill 3.81 -sub_pc_factor_shift_nonzero > -sub_pc_factor_mat_ordering_type rcm > -sub_pc_factor_pivot_in_blocks": RCM gives the smallest fill > value. > > and I got for "convergence": > 0 KSP Residual norm 7.095990612421e+126 % max 1 min 1 max/min 1 > 1 KSP Residual norm 3.313547979190e+123 % max 1.68012e+135 min > 1.68012e+135 max/min 1 > 2 KSP Residual norm 1.257750994639e+119 % max 6.34518e+135 min > 3.55953e+131 max/min 17825.9 > 3 KSP Residual norm 5.233083258710e+118 % max 1.25538e+136 min > 1.42732e+127 max/min 8.79538e+08 > 4 KSP Residual norm 1.938981257595e+118 % max 1.44472e+136 min > 4.82369e+125 max/min 2.99506e+10 > 5 KSP Residual norm 3.371270926617e+116 % max 1.45841e+136 min > 1.79839e+125 max/min 8.1095e+10 > 6 KSP Residual norm 2.179293254483e+115 % max 1.45842e+136 min > 5.16422e+122 max/min 2.82408e+13 > 7 KSP Residual norm 2.120598006522e+115 % max 1.46024e+136 min > 3.67337e+122 max/min 3.97521e+13 > 8 KSP Residual norm 1.486820601733e+115 % max 1.461e+136 min 1.02249e > +122 max/min 1.42886e+14 > 9 KSP Residual norm 7.653834441859e+114 % max 1.46138e+136 min > 4.93314e+121 max/min 2.96237e+14 > 10 KSP Residual norm 5.001204920218e+114 % max 1.47243e+136 min > 4.89675e+121 max/min 3.00694e+14 > > My guess is that the ILU(1) is singular (zero as diagonal > elements?), but I thought that the options "- > sub_pc_factor_shift_nonzero -sub_pc_factor_pivot_in_blocks" were > taking care of that... I got lost in the source files trying to find > out who at the end is computing and applying the ILU for MPIBAIJ > format (replaced by SEQBAIJ with only one partition, I'm guessing). > > The question is: What am I doing wrong? I never heard that ILU(1) > was worst than ILU(0)! > Stef. > > -- > ___________________________________________________________ > Dr. Stephane AUBERT, CEO & CTO > FLUOREM s.a.s > Centre Scientifique Auguste MOIROUX > 64 chemin des MOUILLES > F-69130 ECULLY, FRANCE > International: fax: +33 4.78.33.99.39 tel: +33 4.78.33.99.35 > France: fax: 04.78.33.99.39 tel: 04.78.33.99.35 > email: stephane.aubert at fluorem.com > web: www.fluorem.com > > > From gdiso at ustc.edu Fri May 16 11:47:06 2008 From: gdiso at ustc.edu (Gong Ding) Date: Sat, 17 May 2008 00:47:06 +0800 Subject: mesh ordering and partition References: Message-ID: <92116AFF53544591A677E9E5886DF53C@ustcatmel> ----- Original Message ----- From: "Matthew Knepley" To: Sent: Saturday, May 17, 2008 12:19 AM Subject: Re: mesh ordering and partition > > I think, if you are using the serial PETSc ILU, you should just use a > MatOrdering, > which can be done from the command line: > > -pc_type ilu -pc_factor_mat_ordering_type rcm > > which I tested on KSP ex2. > > Matt I am developing parallel code for 3D semiconductor device simulation. >From the experience of 2D code, the GMRES solver with ILU works well (the matrix is asymmetric.) As a result, I'd like to use GMRES+ILU again for 3D, in parallel. Does -pc_type ilu -pc_factor_mat_ordering_type rcm still work? Since the parallel martrix requires continuous index in subdomain, the matrix ordering seems troublesome. maybe only a local ordering can be done... Am I right? Gong Ding From bsmith at mcs.anl.gov Fri May 16 12:42:56 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 16 May 2008 12:42:56 -0500 Subject: mesh ordering and partition In-Reply-To: <92116AFF53544591A677E9E5886DF53C@ustcatmel> References: <92116AFF53544591A677E9E5886DF53C@ustcatmel> Message-ID: <67F2BC12-21B2-4BEF-9B16-3053B6E2A6E9@mcs.anl.gov> On May 16, 2008, at 11:47 AM, Gong Ding wrote: > > ----- Original Message ----- From: "Matthew Knepley" > > To: > Sent: Saturday, May 17, 2008 12:19 AM > Subject: Re: mesh ordering and partition > > >> >> I think, if you are using the serial PETSc ILU, you should just use a >> MatOrdering, >> which can be done from the command line: >> >> -pc_type ilu -pc_factor_mat_ordering_type rcm >> >> which I tested on KSP ex2. >> >> Matt > > > I am developing parallel code for 3D semiconductor device simulation. > From the experience of 2D code, the GMRES solver with ILU works well > (the matrix is asymmetric.) > As a result, I'd like to use GMRES+ILU again for 3D, in parallel. > Does -pc_type ilu -pc_factor_mat_ordering_type rcm still work? > Since the parallel martrix requires continuous index in subdomain, > the matrix ordering seems troublesome. > maybe only a local ordering can be done... Am I right? > PETSc does not have any parallel ILU, so when you run in parallel you must be either using block Jacobi or the overlapping additive Schwarz method (block Jacobi with overlap between the blocks) and ILU on the subdomains. In this case you must use the -sub prefix on ilu and ordering >> -sub_pc_type ilu -sub_pc_factor_mat_ordering_type rcm > The RCM ordering is done on the submatrix on each process, it is not parallel. It is important to note also that though rcm "may" improve the convergence rate of the ILU slightly, using an ordering on the factorization does require some permutation of the vectors on input and output to the MatSolve (which takes a little bit of time). You really need to run both and see if one is faster than the other (use -log_summary as an option). Barry > Gong Ding > > > From knepley at gmail.com Fri May 16 12:37:52 2008 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 16 May 2008 12:37:52 -0500 Subject: mesh ordering and partition In-Reply-To: <92116AFF53544591A677E9E5886DF53C@ustcatmel> References: <92116AFF53544591A677E9E5886DF53C@ustcatmel> Message-ID: On Fri, May 16, 2008 at 11:47 AM, Gong Ding wrote: > > ----- Original Message ----- From: "Matthew Knepley" > To: > Sent: Saturday, May 17, 2008 12:19 AM > Subject: Re: mesh ordering and partition > > >> >> I think, if you are using the serial PETSc ILU, you should just use a >> MatOrdering, >> which can be done from the command line: >> >> -pc_type ilu -pc_factor_mat_ordering_type rcm >> >> which I tested on KSP ex2. >> >> Matt > > > I am developing parallel code for 3D semiconductor device simulation. > From the experience of 2D code, the GMRES solver with ILU works well (the > matrix is asymmetric.) > As a result, I'd like to use GMRES+ILU again for 3D, in parallel. > Does -pc_type ilu -pc_factor_mat_ordering_type rcm still work? > Since the parallel martrix requires continuous index in subdomain, the > matrix ordering seems troublesome. > maybe only a local ordering can be done... Am I right? Its a local ordering. Remember that Block-Jacobi ILU is a LOT worse than serial ILU. I would not expect it to scale very well. You can try ASM to fix it up, but there are no guarantees. Matt > Gong Ding > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From jed at 59A2.org Fri May 16 12:58:58 2008 From: jed at 59A2.org (Jed Brown) Date: Fri, 16 May 2008 19:58:58 +0200 Subject: mesh ordering and partition In-Reply-To: <92116AFF53544591A677E9E5886DF53C@ustcatmel> References: <92116AFF53544591A677E9E5886DF53C@ustcatmel> Message-ID: <20080516175858.GH21713@brakk.ethz.ch> On Sat 2008-05-17 00:47, Gong Ding wrote: > I am developing parallel code for 3D semiconductor device simulation. > From the experience of 2D code, the GMRES solver with ILU works well (the > matrix is asymmetric.) > As a result, I'd like to use GMRES+ILU again for 3D, in parallel. > Does -pc_type ilu -pc_factor_mat_ordering_type rcm still work? > Since the parallel martrix requires continuous index in subdomain, the > matrix ordering seems troublesome. > maybe only a local ordering can be done... Am I right? For parallel ILU, you could try -pc_type hypre -pc_hypre_type euclid. Unfortunately, ILU requires a lot of communication so the parallel scaling tends to be poor. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From w_subber at yahoo.com Tue May 20 14:12:20 2008 From: w_subber at yahoo.com (Waad Subber) Date: Tue, 20 May 2008 12:12:20 -0700 (PDT) Subject: MatMerge_SeqsToMPI Message-ID: <601684.20346.qm@web38204.mail.mud.yahoo.com> Hi, I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up sparse sequential matrices (SeqAIJ) from each CPU. I am using MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse scall,Mat *mpimat) to do that. However, when I compile the code I get the following undefined reference to `matmerge_seqstompi_' collect2: ld returned 1 exit status make: *** [all] Error 1 Am I using this function correctly ? Thanks Waad -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 20 14:55:47 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 May 2008 14:55:47 -0500 Subject: MatMerge_SeqsToMPI In-Reply-To: <601684.20346.qm@web38204.mail.mud.yahoo.com> References: <601684.20346.qm@web38204.mail.mud.yahoo.com> Message-ID: On Tue, May 20, 2008 at 2:12 PM, Waad Subber wrote: > Hi, > > I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse > scall,Mat *mpimat) > > to do that. However, when I compile the code I get the following > > undefined reference to `matmerge_seqstompi_' > collect2: ld returned 1 exit status > make: *** [all] Error 1 > > Am I using this function correctly ? These have no Fortran bindings right now. Matt > Thanks > > Waad > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From w_subber at yahoo.com Tue May 20 15:16:44 2008 From: w_subber at yahoo.com (Waad Subber) Date: Tue, 20 May 2008 13:16:44 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: Message-ID: <853663.98811.qm@web38203.mail.mud.yahoo.com> Thank you Matt, Any suggestion to solve the problem I am trying to tackle. I want to solve a linear system: Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. Where A_i is a sparse sequential matrix and f_i is a sequential vector. Each CPU has one matrix and one vector of the same size. Now I want to sum up and solve the system in parallel. Thanks again Waad Matthew Knepley wrote: On Tue, May 20, 2008 at 2:12 PM, Waad Subber wrote: > Hi, > > I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse > scall,Mat *mpimat) > > to do that. However, when I compile the code I get the following > > undefined reference to `matmerge_seqstompi_' > collect2: ld returned 1 exit status > make: *** [all] Error 1 > > Am I using this function correctly ? These have no Fortran bindings right now. Matt > Thanks > > Waad > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue May 20 15:49:59 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 20 May 2008 15:49:59 -0500 Subject: MatMerge_SeqsToMPI In-Reply-To: <853663.98811.qm@web38203.mail.mud.yahoo.com> References: <853663.98811.qm@web38203.mail.mud.yahoo.com> Message-ID: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov> On May 20, 2008, at 3:16 PM, Waad Subber wrote: > Thank you Matt, > > Any suggestion to solve the problem I am trying to tackle. I want to > solve a linear system: > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > Where A_i is a sparse sequential matrix and f_i is a sequential > vector. Each CPU has one matrix and one vector of the same size. Now > I want to sum up and solve the system in parallel. Does each A_i have nonzero entries (mostly) associated with one part of the matrix? Or does each process have values scattered all around the matrix? In the former case you should simply create one parallel MPIAIJ matrix and call MatSetValues() to put the values into it. We don't have any kind of support for the later case, perhaps if you describe how the matrix entries come about someone would have suggestions on how to proceed. Barry > > > Thanks again > > Waad > > Matthew Knepley wrote: On Tue, May 20, 2008 at > 2:12 PM, Waad Subber wrote: > > Hi, > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > adding up > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > n,MatReuse > > scall,Mat *mpimat) > > > > to do that. However, when I compile the code I get the following > > > > undefined reference to `matmerge_seqstompi_' > > collect2: ld returned 1 exit status > > make: *** [all] Error 1 > > > > Am I using this function correctly ? > > These have no Fortran bindings right now. > > Matt > > > Thanks > > > > Waad > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > > From knepley at gmail.com Tue May 20 15:49:15 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 May 2008 15:49:15 -0500 Subject: MatMerge_SeqsToMPI In-Reply-To: <853663.98811.qm@web38203.mail.mud.yahoo.com> References: <853663.98811.qm@web38203.mail.mud.yahoo.com> Message-ID: Is there a reason not to assemble the whole system at once? This is usually much easier. You can even use the same indices with MatSetValuesLocal(). Matt On Tue, May 20, 2008 at 3:16 PM, Waad Subber wrote: > Thank you Matt, > > Any suggestion to solve the problem I am trying to tackle. I want to solve a > linear system: > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > Where A_i is a sparse sequential matrix and f_i is a sequential vector. > Each CPU has one matrix and one vector of the same size. Now I want to sum > up and solve the system in parallel. > > Thanks again > > Waad > > Matthew Knepley wrote: > > On Tue, May 20, 2008 at 2:12 PM, Waad Subber wrote: >> Hi, >> >> I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up >> sparse sequential matrices (SeqAIJ) from each CPU. I am using >> >> MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse >> scall,Mat *mpimat) >> >> to do that. However, when I compile the code I get the following >> >> undefined reference to `matmerge_seqstompi_' >> collect2: ld returned 1 exit status >> make: *** [all] Error 1 >> >> Am I using this function correctly ? > > These have no Fortran bindings right now. > > Matt > >> Thanks >> >> Waad >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From w_subber at yahoo.com Tue May 20 19:15:25 2008 From: w_subber at yahoo.com (Waad Subber) Date: Tue, 20 May 2008 17:15:25 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov> Message-ID: <75545.31527.qm@web38205.mail.mud.yahoo.com> Thank you Matt and Barry, The system I am trying to solve is the interface problem in iterative substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is [R_i^T*g_i]. Each process constructs the local Schur complement matrix (S_i) , the restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential vector. Now having the Schur complement matrix for each subdomain, I need to solve the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], .. i=1.. to No. of process (subdomains) in parallel. For the global vector I construct one MPI vector and use VecGetArray () for each of the sequential vector then use VecSetValues () to add the values into the global MPI vector. That works fine. However for the global schur complement matix I try the same idea by creating one parallel MPIAIJ matrix and using MatGetArray( ) and MatSetValues () in order to add the values to the global matrix. MatGetArray( ) gives me only the values without indices, so I don't know how to add these valuse to the global MPI matrix. Thanks agin Waad Barry Smith wrote: On May 20, 2008, at 3:16 PM, Waad Subber wrote: > Thank you Matt, > > Any suggestion to solve the problem I am trying to tackle. I want to > solve a linear system: > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > Where A_i is a sparse sequential matrix and f_i is a sequential > vector. Each CPU has one matrix and one vector of the same size. Now > I want to sum up and solve the system in parallel. Does each A_i have nonzero entries (mostly) associated with one part of the matrix? Or does each process have values scattered all around the matrix? In the former case you should simply create one parallel MPIAIJ matrix and call MatSetValues() to put the values into it. We don't have any kind of support for the later case, perhaps if you describe how the matrix entries come about someone would have suggestions on how to proceed. Barry > > > Thanks again > > Waad > > Matthew Knepley wrote: On Tue, May 20, 2008 at > 2:12 PM, Waad Subber wrote: > > Hi, > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > adding up > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > n,MatReuse > > scall,Mat *mpimat) > > > > to do that. However, when I compile the code I get the following > > > > undefined reference to `matmerge_seqstompi_' > > collect2: ld returned 1 exit status > > make: *** [all] Error 1 > > > > Am I using this function correctly ? > > These have no Fortran bindings right now. > > Matt > > > Thanks > > > > Waad > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 20 19:30:06 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 May 2008 19:30:06 -0500 Subject: MatMerge_SeqsToMPI In-Reply-To: <75545.31527.qm@web38205.mail.mud.yahoo.com> References: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov> <75545.31527.qm@web38205.mail.mud.yahoo.com> Message-ID: On Tue, May 20, 2008 at 7:15 PM, Waad Subber wrote: > Thank you Matt and Barry, > > The system I am trying to solve is the interface problem in iterative > substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is > [R_i^T*g_i]. > > Each process constructs the local Schur complement matrix (S_i) , the > restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential > vector. > > Now having the Schur complement matrix for each subdomain, I need to solve > the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], .. i=1.. to No. > of process (subdomains) in parallel. Barry knows much more than me about substructuring, however: You could form this matrix in parallel, but I thought that involved significant communication. I would think that an unassembled form would be better. To do this you would 1) Construct the VecScatter from the set of local vectors to the global vector 2) Create a MatShell and put the VecScatter in the user context 3) For MatMult(), you would a) Scatter the input vector into the local copies b) Do each local MatMult() c) Scatter the result back to the output vector Some work, but not that complicated. Matt > For the global vector I construct one MPI vector and use VecGetArray () for > each of the sequential vector then use VecSetValues () to add the values > into the global MPI vector. That works fine. > > However for the global schur complement matix I try the same idea by > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > MatSetValues () in order to add the values to the global matrix. > MatGetArray( ) gives me only the values without indices, so I don't know how > to add these valuse to the global MPI matrix. > > Thanks agin > > Waad > > Barry Smith wrote: > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > >> Thank you Matt, >> >> Any suggestion to solve the problem I am trying to tackle. I want to >> solve a linear system: >> >> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. >> >> Where A_i is a sparse sequential matrix and f_i is a sequential >> vector. Each CPU has one matrix and one vector of the same size. Now >> I want to sum up and solve the system in parallel. > > Does each A_i have nonzero entries (mostly) associated with one > part of the matrix? Or does each process have values > scattered all around the matrix? > > In the former case you should simply create one parallel MPIAIJ > matrix and call MatSetValues() to put the values > into it. We don't have any kind of support for the later case, perhaps > if you describe how the matrix entries come about someone > would have suggestions on how to proceed. > > Barry > >> >> >> Thanks again >> >> Waad >> >> Matthew Knepley wrote: On Tue, May 20, 2008 at >> 2:12 PM, Waad Subber wrote: >> > Hi, >> > >> > I am trying to construct a sparse parallel matrix (MPIAIJ) by >> adding up >> > sparse sequential matrices (SeqAIJ) from each CPU. I am using >> > >> > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt >> n,MatReuse >> > scall,Mat *mpimat) >> > >> > to do that. However, when I compile the code I get the following >> > >> > undefined reference to `matmerge_seqstompi_' >> > collect2: ld returned 1 exit status >> > make: *** [all] Error 1 >> > >> > Am I using this function correctly ? >> >> These have no Fortran bindings right now. >> >> Matt >> >> > Thanks >> > >> > Waad >> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> >> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From dalcinl at gmail.com Tue May 20 19:44:32 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 20 May 2008 21:44:32 -0300 Subject: MatMerge_SeqsToMPI In-Reply-To: <75545.31527.qm@web38205.mail.mud.yahoo.com> References: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov> <75545.31527.qm@web38205.mail.mud.yahoo.com> Message-ID: On 5/20/08, Waad Subber wrote: > The system I am trying to solve is the interface problem in iterative > substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is > [R_i^T*g_i]. > > Each process constructs the local Schur complement matrix (S_i) , the > restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential > vector. Two questions: 1) How do you actually get the local Schur complements. You explicitelly compute its entries, or do you compute it after computing the inverse (or LU factors) of a 'local' matrix? 2) Your R_i matrix is actually a matrix? In that case, it is a trivial restrinction operation with ones and zeros? Or R_i is actually a VecScatter? And finally: are you trying to apply a Krylov method over the global Schur complement? In such a case, are you going to implement a preconditioner for it? > Now having the Schur complement matrix for each subdomain, I need to solve > the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > .. i=1.. to No. of process (subdomains) in parallel. > > For the global vector I construct one MPI vector and use VecGetArray () for > each of the sequential vector then use VecSetValues () to add the values > into the global MPI vector. That works fine. > > However for the global schur complement matix I try the same idea by > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > MatSetValues () in order to add the values to the global matrix. > MatGetArray( ) gives me only the values without indices, so I don't know how > to add these valuse to the global MPI matrix. > > Thanks agin > > Waad > > Barry Smith wrote: > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > Thank you Matt, > > > > Any suggestion to solve the problem I am trying to tackle. I want to > > solve a linear system: > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > vector. Each CPU has one matrix and one vector of the same size. Now > > I want to sum up and solve the system in parallel. > > Does each A_i have nonzero entries (mostly) associated with one > part of the matrix? Or does each process have values > scattered all around the matrix? > > In the former case you should simply create one parallel MPIAIJ > matrix and call MatSetValues() to put the values > into it. We don't have any kind of support for the later case, perhaps > if you describe how the matrix entries come about someone > would have suggestions on how to proceed. > > Barry > > > > > > > Thanks again > > > > Waad > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > 2:12 PM, Waad Subber wrote: > > > Hi, > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > adding up > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > n,MatReuse > > > scall,Mat *mpimat) > > > > > > to do that. However, when I compile the code I get the following > > > > > > undefined reference to `matmerge_seqstompi_' > > > collect2: ld returned 1 exit status > > > make: *** [all] Error 1 > > > > > > Am I using this function correctly ? > > > > These have no Fortran bindings right now. > > > > Matt > > > > > Thanks > > > > > > Waad > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > > > > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From w_subber at yahoo.com Tue May 20 19:56:02 2008 From: w_subber at yahoo.com (Waad Subber) Date: Tue, 20 May 2008 17:56:02 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: Message-ID: <830531.15572.qm@web38206.mail.mud.yahoo.com> Thank you Matt It seems a god idea I 'll try it. Waad Matthew Knepley wrote: On Tue, May 20, 2008 at 7:15 PM, Waad Subber wrote: > Thank you Matt and Barry, > > The system I am trying to solve is the interface problem in iterative > substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is > [R_i^T*g_i]. > > Each process constructs the local Schur complement matrix (S_i) , the > restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential > vector. > > Now having the Schur complement matrix for each subdomain, I need to solve > the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], .. i=1.. to No. > of process (subdomains) in parallel. Barry knows much more than me about substructuring, however: You could form this matrix in parallel, but I thought that involved significant communication. I would think that an unassembled form would be better. To do this you would 1) Construct the VecScatter from the set of local vectors to the global vector 2) Create a MatShell and put the VecScatter in the user context 3) For MatMult(), you would a) Scatter the input vector into the local copies b) Do each local MatMult() c) Scatter the result back to the output vector Some work, but not that complicated. Matt > For the global vector I construct one MPI vector and use VecGetArray () for > each of the sequential vector then use VecSetValues () to add the values > into the global MPI vector. That works fine. > > However for the global schur complement matix I try the same idea by > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > MatSetValues () in order to add the values to the global matrix. > MatGetArray( ) gives me only the values without indices, so I don't know how > to add these valuse to the global MPI matrix. > > Thanks agin > > Waad > > Barry Smith wrote: > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > >> Thank you Matt, >> >> Any suggestion to solve the problem I am trying to tackle. I want to >> solve a linear system: >> >> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. >> >> Where A_i is a sparse sequential matrix and f_i is a sequential >> vector. Each CPU has one matrix and one vector of the same size. Now >> I want to sum up and solve the system in parallel. > > Does each A_i have nonzero entries (mostly) associated with one > part of the matrix? Or does each process have values > scattered all around the matrix? > > In the former case you should simply create one parallel MPIAIJ > matrix and call MatSetValues() to put the values > into it. We don't have any kind of support for the later case, perhaps > if you describe how the matrix entries come about someone > would have suggestions on how to proceed. > > Barry > >> >> >> Thanks again >> >> Waad >> >> Matthew Knepley wrote: On Tue, May 20, 2008 at >> 2:12 PM, Waad Subber wrote: >> > Hi, >> > >> > I am trying to construct a sparse parallel matrix (MPIAIJ) by >> adding up >> > sparse sequential matrices (SeqAIJ) from each CPU. I am using >> > >> > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt >> n,MatReuse >> > scall,Mat *mpimat) >> > >> > to do that. However, when I compile the code I get the following >> > >> > undefined reference to `matmerge_seqstompi_' >> > collect2: ld returned 1 exit status >> > make: *** [all] Error 1 >> > >> > Am I using this function correctly ? >> >> These have no Fortran bindings right now. >> >> Matt >> >> > Thanks >> > >> > Waad >> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> >> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From w_subber at yahoo.com Tue May 20 20:06:27 2008 From: w_subber at yahoo.com (Waad Subber) Date: Tue, 20 May 2008 18:06:27 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: Message-ID: <855669.85397.qm@web38202.mail.mud.yahoo.com> Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote: > The system I am trying to solve is the interface problem in iterative > substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is > [R_i^T*g_i]. > > Each process constructs the local Schur complement matrix (S_i) , the > restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential > vector. Two questions: 1) How do you actually get the local Schur complements. You explicitelly compute its entries, or do you compute it after computing the inverse (or LU factors) of a 'local' matrix? I construct the local Schur complement matrices after getting the inversion of A_II matrix for each subdomain. 2) Your R_i matrix is actually a matrix? In that case, it is a trivial restrinction operation with ones and zeros? Or R_i is actually a VecScatter? R_i is the restriction matrix maps the global boundary nodes to the local boundary nodes and its entries is zero and one I store it as spare matrix, so only I need to store the nonzero entries which one entry per a row And finally: are you trying to apply a Krylov method over the global Schur complement? In such a case, are you going to implement a preconditioner for it? Yes, that what I am trying to do > Now having the Schur complement matrix for each subdomain, I need to solve > the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > .. i=1.. to No. of process (subdomains) in parallel. > > For the global vector I construct one MPI vector and use VecGetArray () for > each of the sequential vector then use VecSetValues () to add the values > into the global MPI vector. That works fine. > > However for the global schur complement matix I try the same idea by > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > MatSetValues () in order to add the values to the global matrix. > MatGetArray( ) gives me only the values without indices, so I don't know how > to add these valuse to the global MPI matrix. > > Thanks agin > > Waad > > Barry Smith wrote: > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > Thank you Matt, > > > > Any suggestion to solve the problem I am trying to tackle. I want to > > solve a linear system: > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > vector. Each CPU has one matrix and one vector of the same size. Now > > I want to sum up and solve the system in parallel. > > Does each A_i have nonzero entries (mostly) associated with one > part of the matrix? Or does each process have values > scattered all around the matrix? > > In the former case you should simply create one parallel MPIAIJ > matrix and call MatSetValues() to put the values > into it. We don't have any kind of support for the later case, perhaps > if you describe how the matrix entries come about someone > would have suggestions on how to proceed. > > Barry > > > > > > > Thanks again > > > > Waad > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > 2:12 PM, Waad Subber wrote: > > > Hi, > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > adding up > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > n,MatReuse > > > scall,Mat *mpimat) > > > > > > to do that. However, when I compile the code I get the following > > > > > > undefined reference to `matmerge_seqstompi_' > > > collect2: ld returned 1 exit status > > > make: *** [all] Error 1 > > > > > > Am I using this function correctly ? > > > > These have no Fortran bindings right now. > > > > Matt > > > > > Thanks > > > > > > Waad > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > > > > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdettrick at gmail.com Wed May 21 11:14:41 2008 From: sdettrick at gmail.com (Sean Dettrick) Date: Wed, 21 May 2008 09:14:41 -0700 Subject: mixed matrix type? Message-ID: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> Hi, I have a sparse N*N matrix generated from a DA and a 5 point stencil, with a total of approx 5*N non-zero entries. Now I would like to extend this matrix by adding a smaller M*M dense matrix to the bottom right hand side, i.e. so that there is a dense square in the bottom right hand corner of the otherwise sparse matrix. The total number of new non-zero entries, M*M, is comparable to the total number of old entries, 5*N. On top of this, there would be a small number of non- zero entries in the new upper-right and lower-left rectangular portions of the matrix, due to coupling of the two systems. The new total matrix size (including zeroes) would be (N+M)*(N+M). Can anybody recommend a Mat type to store the new matrix? One possibility I was thinking of was to establish the original sparse Mat with a DA in a sub-communicator (with half the CPUs), and get the ownership range with MatGetOwnershipRange. Then in the petsc_comm_world communicator, the complete matrix could be constructed (by element-wise copying the old one I suppose), and the ownership range could be maintained manually. Does this sound like a reasonable strategy? I would very much appreciate any suggestions or advice. Thanks, Sean Dettrick From sdettrick at gmail.com Wed May 21 11:21:43 2008 From: sdettrick at gmail.com (Sean Dettrick) Date: Wed, 21 May 2008 09:21:43 -0700 Subject: Fwd: mixed matrix type? References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> Message-ID: <7CB7A206-8DD3-49FB-9965-DD0F6E939058@gmail.com> I should have also mentioned that the matrix is symmetric. Thanks, Sean Begin forwarded message: > From: Sean Dettrick > Date: May 21, 2008 9:14:41 AM PDT > To: petsc-users at mcs.anl.gov > Subject: mixed matrix type? > > Hi, > > I have a sparse N*N matrix generated from a DA and a 5 point > stencil, with a total of approx 5*N non-zero entries. Now I would > like to extend this matrix by adding a smaller M*M dense matrix to > the bottom right hand side, i.e. so that there is a dense square in > the bottom right hand corner of the otherwise sparse matrix. The > total number of new non-zero entries, M*M, is comparable to the > total number of old entries, 5*N. On top of this, there would be a > small number of non-zero entries in the new upper-right and lower- > left rectangular portions of the matrix, due to coupling of the two > systems. The new total matrix size (including zeroes) would be (N > +M)*(N+M). > > Can anybody recommend a Mat type to store the new matrix? > > One possibility I was thinking of was to establish the original > sparse Mat with a DA in a sub-communicator (with half the CPUs), and > get the ownership range with MatGetOwnershipRange. Then in the > petsc_comm_world communicator, the complete matrix could be > constructed (by element-wise copying the old one I suppose), and the > ownership range could be maintained manually. > > Does this sound like a reasonable strategy? > > I would very much appreciate any suggestions or advice. > > Thanks, > > Sean Dettrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 21 11:26:21 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 21 May 2008 11:26:21 -0500 Subject: mixed matrix type? In-Reply-To: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> Message-ID: We would like very much to have flexible code that did all this stuff for you, sadly we are far far away from this. I would suggest looking at DAGetMatrix() and pick the one that matches your DA (2d or 3d) etc. Then you could modify the code that preallocates the matrix with the additional preallocation information and then modify the part that puts in the locations of nonzeros to match your structure. This way you will get perfect preallocation (which will be important for you to get good speed). I would just use the MPIAIJ matrix format for now (and maybe forever is ok) because the inode code will make that smaller dense part pretty fast anyways without requiring massively complicated matrix data structures that store parts dense and parts sparse. Barry On May 21, 2008, at 11:14 AM, Sean Dettrick wrote: > Hi, > > I have a sparse N*N matrix generated from a DA and a 5 point > stencil, with a total of approx 5*N non-zero entries. Now I would > like to extend this matrix by adding a smaller M*M dense matrix to > the bottom right hand side, i.e. so that there is a dense square in > the bottom right hand corner of the otherwise sparse matrix. The > total number of new non-zero entries, M*M, is comparable to the > total number of old entries, 5*N. On top of this, there would be a > small number of non-zero entries in the new upper-right and lower- > left rectangular portions of the matrix, due to coupling of the two > systems. The new total matrix size (including zeroes) would be (N > +M)*(N+M). > > Can anybody recommend a Mat type to store the new matrix? > > One possibility I was thinking of was to establish the original > sparse Mat with a DA in a sub-communicator (with half the CPUs), and > get the ownership range with MatGetOwnershipRange. Then in the > petsc_comm_world communicator, the complete matrix could be > constructed (by element-wise copying the old one I suppose), and the > ownership range could be maintained manually. > > Does this sound like a reasonable strategy? > > I would very much appreciate any suggestions or advice. > > Thanks, > > Sean Dettrick > > From knepley at gmail.com Wed May 21 11:29:17 2008 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 21 May 2008 11:29:17 -0500 Subject: mixed matrix type? In-Reply-To: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> Message-ID: On Wed, May 21, 2008 at 11:14 AM, Sean Dettrick wrote: > Hi, > > I have a sparse N*N matrix generated from a DA and a 5 point stencil, with a > total of approx 5*N non-zero entries. Now I would like to extend this > matrix by adding a smaller M*M dense matrix to the bottom right hand side, > i.e. so that there is a dense square in the bottom right hand corner of the > otherwise sparse matrix. The total number of new non-zero entries, M*M, is > comparable to the total number of old entries, 5*N. On top of this, there > would be a small number of non-zero entries in the new upper-right and > lower-left rectangular portions of the matrix, due to coupling of the two > systems. The new total matrix size (including zeroes) would be (N+M)*(N+M). > > Can anybody recommend a Mat type to store the new matrix? > > One possibility I was thinking of was to establish the original sparse Mat > with a DA in a sub-communicator (with half the CPUs), and get the ownership > range with MatGetOwnershipRange. Then in the petsc_comm_world communicator, > the complete matrix could be constructed (by element-wise copying the old > one I suppose), and the ownership range could be maintained manually. > > Does this sound like a reasonable strategy? Not really. If you only want parallelism suitable for the M matrix, I would use a Schur complement strategy: 1) Make the DA matrix 2) Make M, and the coupling matrices B, B^T 3) Make a MatShell that has a MatMult() that applies B^T A^{-1} B + M with A^{-1} done using a KSPSolve(). This seems easier to program and solve to me than the fully assembled one. We hope later to have something built to do the full assembly. Matt > I would very much appreciate any suggestions or advice. > > Thanks, > > Sean Dettrick > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From sdettrick at gmail.com Wed May 21 11:45:23 2008 From: sdettrick at gmail.com (Sean Dettrick) Date: Wed, 21 May 2008 09:45:23 -0700 Subject: mixed matrix type? In-Reply-To: References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> Message-ID: <9E9D647B-94FC-4F33-BFF0-8D150CA5509C@gmail.com> Thanks Matt, thanks Barry, much appreciated. Best, Sean From dalcinl at gmail.com Thu May 22 15:15:27 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 22 May 2008 17:15:27 -0300 Subject: MatMerge_SeqsToMPI In-Reply-To: <855669.85397.qm@web38202.mail.mud.yahoo.com> References: <855669.85397.qm@web38202.mail.mud.yahoo.com> Message-ID: On 5/20/08, Waad Subber wrote: > 1) How do you actually get the local Schur complements. You > explicitelly compute its entries, or do you compute it after computing > the inverse (or LU factors) of a 'local' matrix? > > I construct the local Schur complement matrices after getting the inversion > of A_II matrix for each subdomain. Fine, > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial > restrinction operation with ones and zeros? Or R_i is actually a > VecScatter? > > R_i is the restriction matrix maps the global boundary nodes to the local > boundary nodes and its entries is zero and one I store it as spare matrix, > so only I need to store the nonzero entries which one entry per a row I believe a VecScatter will perform much better for this task. > And finally: are you trying to apply a Krylov method over the global > Schur complement? In such a case, are you going to implement a > preconditioner for it? > > Yes, that what I am trying to do Well, please let me make some comments. I've spent many days and month optimizing Schur complement iterations, and I ended giving up. I was never able to get it perform better than ASM preconditioner (iff appropriatelly used, ie. solving local problems with LU, and implementing subdomain subpartitioning the smart way, not the way currently implemented in PETSc, were subpartitioning is done by chunks of continuous rows). If you are doing research on this, I would love to know your conclusion when you get your work done. If you are doing all this just with the hope of getting better running times, well, remember my above comments but also remember that I do not consider myself a smart guy ;-) As I said before, I worked hard for implementing general Schur complement iteration. All this code is avalable in the SVN repository of petsc4py (PETSc for Python), but it could be easily stripped out for use in any PETSc-based code in C/C++. This implementation requires the use of a MATIS matrix type (there is also a separate implementation for MATMPIAIJ maatrices), I've implemented subdomain subpartitioning (using a simple recursive graph splitting procedure reusing matrix reordering routines built-in in PETSc, could be done better with METIS); when the A_ii problems are large, their LU factorization can be a real bootleneck. I've even implemented a global preconditioner operation for the interface problem, based on iterating over a 'strip' of nodes around the interface; it improves convergence and is usefull for ill-conditioned systems, but the costs are increased. If you ever want to take a look at my implemention for try to use it, or perhaps take ideas for your own implementation, let me know. > > Now having the Schur complement matrix for each subdomain, I need to solve > > the interface problem > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > .. i=1.. to No. of process (subdomains) in parallel. > > > > For the global vector I construct one MPI vector and use VecGetArray () > for > > each of the sequential vector then use VecSetValues () to add the values > > into the global MPI vector. That works fine. > > > > However for the global schur complement matix I try the same idea by > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > MatSetValues () in order to add the values to the global matrix. > > MatGetArray( ) gives me only the values without indices, so I don't know > how > > to add these valuse to the global MPI matrix. > > > > Thanks agin > > > > Waad > > > > Barry Smith wrote: > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > Thank you Matt, > > > > > > Any suggestion to solve the problem I am trying to tackle. I want to > > > solve a linear system: > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > vector. Each CPU has one matrix and one vector of the same size. Now > > > I want to sum up and solve the system in parallel. > > > > Does each A_i have nonzero entries (mostly) associated with one > > part of the matrix? Or does each process have values > > scattered all around the matrix? > > > > In the former case you should simply create one parallel MPIAIJ > > matrix and call MatSetValues() to put the values > > into it. We don't have any kind of support for the later case, perhaps > > if you describe how the matrix entries come about someone > > would have suggestions on how to proceed. > > > > Barry > > > > > > > > > > > Thanks again > > > > > > Waad > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > 2:12 PM, Waad Subber wrote: > > > > Hi, > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > adding up > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > > n,MatReuse > > > > scall,Mat *mpimat) > > > > > > > > to do that. However, when I compile the code I get the following > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > collect2: ld returned 1 exit status > > > > make: *** [all] Error 1 > > > > > > > > Am I using this function correctly ? > > > > > > These have no Fortran bindings right now. > > > > > > Matt > > > > > > > Thanks > > > > > > > > Waad > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From amjad11 at gmail.com Fri May 23 00:30:42 2008 From: amjad11 at gmail.com (amjad ali) Date: Fri, 23 May 2008 10:30:42 +0500 Subject: general question on speed using quad core Xeons In-Reply-To: References: <48054602.9040200@gmail.com> Message-ID: <428810f20805222230p6100e1b8wb47e846bd52ff3fd@mail.gmail.com> Hello all, specially Dr. Matt, On 4/16/08, Matthew Knepley wrote: > > On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie > wrote: > > I'm running my PETSc code on a cluster of quad core Xeon's connected > > by Infiniband. I hadn't much worried about the performance, because > > everything seemed to be working quite well, but today I was actually > > comparing performance (wall clock time) for the same problem, but on > > different combinations of CPUS. > > > > I find that my PETSc code is quite scalable until I start to use > > multiple cores/cpu. > > > > For example, the run time doesn't improve by going from 1 core/cpu > > to 4 cores/cpu, and I find this to be very strange, especially since > > looking at top or Ganglia, all 4 cpus on each node are running at 100% > > almost > > all of the time. I would have thought if the cpus were going all out, > > that I would still be getting much more scalable results. > > Those a really coarse measures. There is absolutely no way that all cores > are going 100%. Its easy to show by hand. Take the peak flop rate and > this gives you the bandwidth needed to sustain that computation (if > everything is perfect, like axpy). You will find that the chip bandwidth > is far below this. A nice analysis is in > > http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf > > > We are using mvapich-0.9.9 with infiniband. So, I don't know if > > this is a cluster/Xeon issue, or something else. > > This is actually mathematics! How satisfying. The only way to improve > this is to change the data structure (e.g. use blocks) or change the > algorithm (e.g. use spectral elements and unassembled structures) Would you please explain a bit about "unassembled structures"? Does Discontinuous Galerkin Method falls into this category? Thanks and Regrads, Amjad Ali. Matt > > > Anybody with experience on this? > > > > Thanks, Randy M. > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri May 23 02:52:13 2008 From: jed at 59A2.org (Jed Brown) Date: Fri, 23 May 2008 09:52:13 +0200 Subject: general question on speed using quad core Xeons In-Reply-To: <428810f20805222230p6100e1b8wb47e846bd52ff3fd@mail.gmail.com> References: <48054602.9040200@gmail.com> <428810f20805222230p6100e1b8wb47e846bd52ff3fd@mail.gmail.com> Message-ID: <20080523075213.GE21713@brakk.ethz.ch> On Fri 2008-05-23 10:30, amjad ali wrote: > Would you please explain a bit about "unassembled structures"? > Does Discontinuous Galerkin Method falls into this category? I'm doing some work on this so I'll try to answer. There are two components which can be ``unassembled'' namely the matrix application and the preconditioner. In general, unassembled makes the most sense for semi-structured approximations where there is natural data granularity similar to L1 cache size. A standard example is the spectral or p-version finite element method. In these methods, the element stiffness matrix is dense and can be very large, but it is possible to apply it without storing the entries. For instance, suppose we have polynomial order p-1 on a hexahedral element. Then there are p^3 element degrees of freedom and the global matrix will have p^6 nonzeros contributed by this element (if we assemble it). For p=10, this is already big and will make preconditioning very expensive. On the other hand, we can apply the element Jacobian in O(p^3) space and O(p^4) time if we exploit a tensor product basis. Even if we don't use tensor products, the space requirement can stay the same. For high order p, this will be a lot less operations, but the important point with regard to FLOP/s is that there is now more work per amount of memory touched. If there are N elements, the total number of degrees of freedom is O(N p^3) and the matrix has O(N p^6) entries so multiplication is O(N p^6) time and space. If applied locally (i.e. scatter to local basis, apply there, scatter back) the space requirement is O(N p^3) and the time is O(N p^4) or O(N p^6) depending on the basis. In addition, the element operations can often be done entirely in L1 cache. Clearly this puts a lot less stress on the memory bus. Of course, just applying the Jacobian fast won't cut it, we also need a preconditioner. One way is to assemble a sparser approximation to the global Jacobian and apply standard preconditioners. Another is to apply a domain decomposition preconditioner which exploits the data granularity. For instance, Schur complement preconditioners can be formed based on solves on a single element. These are most attractive when there is a `fast' way to solve the local problem on a single element, but they can be expected to increase the FLOP/s rate either way because the memory access pattern requires more work for a given amount of memory. (I don't know a lot about DD preconditioners. I'm using the former approach, assembling a sparser approximation of the Jacobian.) Discontinuous Galerkin happens to be easy to implement for high order elements and the scatter from global to local basis is trivial. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From w_subber at yahoo.com Fri May 23 11:20:14 2008 From: w_subber at yahoo.com (Waad Subber) Date: Fri, 23 May 2008 09:20:14 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: Message-ID: <212555.70047.qm@web38205.mail.mud.yahoo.com> Thank you for the useful comments. For sure I will consider them. I've just starting my research by writing a DDM substructuring code which scales for now up-to 60 CPUs using petsc KSP solver for the interface problem and Lapack direct factorization for the interior problem. I split the domain using METIS library and assign each subdomain to one process then solve the global Schur complement using parallel preconditioned iterative solver. As an initial attempt, I solved a 2D elasticity problem (about 100,000 DOFs) within seconds using this algorithm. I notice Lapack solver for the interior problem takes a lot of time compare to the iterative solver for the interface, so now I am replacing the direct factorization with petsc KSP solver. I would like very much to have a look at your implementation, and I think that will be very useful to me. Thanks Waad Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote: > 1) How do you actually get the local Schur complements. You > explicitelly compute its entries, or do you compute it after computing > the inverse (or LU factors) of a 'local' matrix? > > I construct the local Schur complement matrices after getting the inversion > of A_II matrix for each subdomain. Fine, > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial > restrinction operation with ones and zeros? Or R_i is actually a > VecScatter? > > R_i is the restriction matrix maps the global boundary nodes to the local > boundary nodes and its entries is zero and one I store it as spare matrix, > so only I need to store the nonzero entries which one entry per a row I believe a VecScatter will perform much better for this task. > And finally: are you trying to apply a Krylov method over the global > Schur complement? In such a case, are you going to implement a > preconditioner for it? > > Yes, that what I am trying to do Well, please let me make some comments. I've spent many days and month optimizing Schur complement iterations, and I ended giving up. I was never able to get it perform better than ASM preconditioner (iff appropriatelly used, ie. solving local problems with LU, and implementing subdomain subpartitioning the smart way, not the way currently implemented in PETSc, were subpartitioning is done by chunks of continuous rows). If you are doing research on this, I would love to know your conclusion when you get your work done. If you are doing all this just with the hope of getting better running times, well, remember my above comments but also remember that I do not consider myself a smart guy ;-) As I said before, I worked hard for implementing general Schur complement iteration. All this code is avalable in the SVN repository of petsc4py (PETSc for Python), but it could be easily stripped out for use in any PETSc-based code in C/C++. This implementation requires the use of a MATIS matrix type (there is also a separate implementation for MATMPIAIJ maatrices), I've implemented subdomain subpartitioning (using a simple recursive graph splitting procedure reusing matrix reordering routines built-in in PETSc, could be done better with METIS); when the A_ii problems are large, their LU factorization can be a real bootleneck. I've even implemented a global preconditioner operation for the interface problem, based on iterating over a 'strip' of nodes around the interface; it improves convergence and is usefull for ill-conditioned systems, but the costs are increased. If you ever want to take a look at my implemention for try to use it, or perhaps take ideas for your own implementation, let me know. > > Now having the Schur complement matrix for each subdomain, I need to solve > > the interface problem > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > .. i=1.. to No. of process (subdomains) in parallel. > > > > For the global vector I construct one MPI vector and use VecGetArray () > for > > each of the sequential vector then use VecSetValues () to add the values > > into the global MPI vector. That works fine. > > > > However for the global schur complement matix I try the same idea by > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > MatSetValues () in order to add the values to the global matrix. > > MatGetArray( ) gives me only the values without indices, so I don't know > how > > to add these valuse to the global MPI matrix. > > > > Thanks agin > > > > Waad > > > > Barry Smith wrote: > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > Thank you Matt, > > > > > > Any suggestion to solve the problem I am trying to tackle. I want to > > > solve a linear system: > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > vector. Each CPU has one matrix and one vector of the same size. Now > > > I want to sum up and solve the system in parallel. > > > > Does each A_i have nonzero entries (mostly) associated with one > > part of the matrix? Or does each process have values > > scattered all around the matrix? > > > > In the former case you should simply create one parallel MPIAIJ > > matrix and call MatSetValues() to put the values > > into it. We don't have any kind of support for the later case, perhaps > > if you describe how the matrix entries come about someone > > would have suggestions on how to proceed. > > > > Barry > > > > > > > > > > > Thanks again > > > > > > Waad > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > 2:12 PM, Waad Subber wrote: > > > > Hi, > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > adding up > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > > n,MatReuse > > > > scall,Mat *mpimat) > > > > > > > > to do that. However, when I compile the code I get the following > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > collect2: ld returned 1 exit status > > > > make: *** [all] Error 1 > > > > > > > > Am I using this function correctly ? > > > > > > These have no Fortran bindings right now. > > > > > > Matt > > > > > > > Thanks > > > > > > > > Waad > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 23 13:10:48 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 23 May 2008 13:10:48 -0500 Subject: MatMerge_SeqsToMPI In-Reply-To: <212555.70047.qm@web38205.mail.mud.yahoo.com> References: <212555.70047.qm@web38205.mail.mud.yahoo.com> Message-ID: <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov> On May 23, 2008, at 11:20 AM, Waad Subber wrote: > Thank you for the useful comments. For sure I will consider them. > I've just starting my research by writing a DDM substructuring code > which scales for now up-to 60 CPUs using petsc KSP solver for the > interface problem and Lapack direct factorization for the interior > problem. I split the domain using METIS library and assign each > subdomain to one process then solve the global Schur complement > using parallel preconditioned iterative solver. As an initial > attempt, I solved a 2D elasticity problem (about 100,000 DOFs) > within seconds using this algorithm. I notice Lapack solver for the > interior problem takes a lot of time compare to the iterative solver > for the interface, so now I am replacing the direct factorization > with petsc KSP solver. In my experience you will want to use a KSP direct solve, for example just -ksp_type preonly -pc_type lu; using an iterative solver in such Schur complement is always way to slow. Barry > > > I would like very much to have a look at your implementation, and I > think that will be very useful to me. > > Thanks > > Waad > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber > wrote: > > 1) How do you actually get the local Schur complements. You > > explicitelly compute its entries, or do you compute it after > computing > > the inverse (or LU factors) of a 'local' matrix? > > > > I construct the local Schur complement matrices after getting the > inversion > > of A_II matrix for each subdomain. > > Fine, > > > 2) Your R_i matrix is actually a matrix? In that case, it is a > trivial > > restrinction operation with ones and zeros? Or R_i is actually a > > VecScatter? > > > > R_i is the restriction matrix maps the global boundary nodes to > the local > > boundary nodes and its entries is zero and one I store it as spare > matrix, > > so only I need to store the nonzero entries which one entry per a > row > > I believe a VecScatter will perform much better for this task. > > > > And finally: are you trying to apply a Krylov method over the global > > Schur complement? In such a case, are you going to implement a > > preconditioner for it? > > > > Yes, that what I am trying to do > > Well, please let me make some comments. I've spent many days and month > optimizing Schur complement iterations, and I ended giving up. I was > never able to get it perform better than ASM preconditioner (iff > appropriatelly used, ie. solving local problems with LU, and > implementing subdomain subpartitioning the smart way, not the way > currently implemented in PETSc, were subpartitioning is done by chunks > of continuous rows). > > If you are doing research on this, I would love to know your > conclusion when you get your work done. If you are doing all this just > with the hope of getting better running times, well, remember my above > comments but also remember that I do not consider myself a smart guy > ;-) > > As I said before, I worked hard for implementing general Schur > complement iteration. All this code is avalable in the SVN repository > of petsc4py (PETSc for Python), but it could be easily stripped out > for use in any PETSc-based code in C/C++. This implementation requires > the use of a MATIS matrix type (there is also a separate > implementation for MATMPIAIJ maatrices), I've implemented subdomain > subpartitioning (using a simple recursive graph splitting procedure > reusing matrix reordering routines built-in in PETSc, could be done > better with METIS); when the A_ii problems are large, their LU > factorization can be a real bootleneck. I've even implemented a > global preconditioner operation for the interface problem, based on > iterating over a 'strip' of nodes around the interface; it improves > convergence and is usefull for ill-conditioned systems, but the costs > are increased. > > If you ever want to take a look at my implemention for try to use it, > or perhaps take ideas for your own implementation, let me know. > > > > > > > > Now having the Schur complement matrix for each subdomain, I > need to solve > > > the interface problem > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > > .. i=1.. to No. of process (subdomains) in parallel. > > > > > > For the global vector I construct one MPI vector and use > VecGetArray () > > for > > > each of the sequential vector then use VecSetValues () to add > the values > > > into the global MPI vector. That works fine. > > > > > > However for the global schur complement matix I try the same > idea by > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > > MatSetValues () in order to add the values to the global matrix. > > > MatGetArray( ) gives me only the values without indices, so I > don't know > > how > > > to add these valuse to the global MPI matrix. > > > > > > Thanks agin > > > > > > Waad > > > > > > Barry Smith wrote: > > > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > > > Thank you Matt, > > > > > > > > Any suggestion to solve the problem I am trying to tackle. I > want to > > > > solve a linear system: > > > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > > vector. Each CPU has one matrix and one vector of the same > size. Now > > > > I want to sum up and solve the system in parallel. > > > > > > Does each A_i have nonzero entries (mostly) associated with one > > > part of the matrix? Or does each process have values > > > scattered all around the matrix? > > > > > > In the former case you should simply create one parallel MPIAIJ > > > matrix and call MatSetValues() to put the values > > > into it. We don't have any kind of support for the later case, > perhaps > > > if you describe how the matrix entries come about someone > > > would have suggestions on how to proceed. > > > > > > Barry > > > > > > > > > > > > > > > Thanks again > > > > > > > > Waad > > > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > > 2:12 PM, Waad Subber wrote: > > > > > Hi, > > > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > > adding up > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt > m,PetscInt > > > > n,MatReuse > > > > > scall,Mat *mpimat) > > > > > > > > > > to do that. However, when I compile the code I get the > following > > > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > > collect2: ld returned 1 exit status > > > > > make: *** [all] Error 1 > > > > > > > > > > Am I using this function correctly ? > > > > > > > > These have no Fortran bindings right now. > > > > > > > > Matt > > > > > > > > > Thanks > > > > > > > > > > Waad > > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to > which > > > > their experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a > (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica > (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > > From w_subber at yahoo.com Fri May 23 13:35:58 2008 From: w_subber at yahoo.com (Waad Subber) Date: Fri, 23 May 2008 11:35:58 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov> Message-ID: <673926.73668.qm@web38205.mail.mud.yahoo.com> Excellent...! thank you Barry Waad :) Barry Smith wrote: On May 23, 2008, at 11:20 AM, Waad Subber wrote: > Thank you for the useful comments. For sure I will consider them. > I've just starting my research by writing a DDM substructuring code > which scales for now up-to 60 CPUs using petsc KSP solver for the > interface problem and Lapack direct factorization for the interior > problem. I split the domain using METIS library and assign each > subdomain to one process then solve the global Schur complement > using parallel preconditioned iterative solver. As an initial > attempt, I solved a 2D elasticity problem (about 100,000 DOFs) > within seconds using this algorithm. I notice Lapack solver for the > interior problem takes a lot of time compare to the iterative solver > for the interface, so now I am replacing the direct factorization > with petsc KSP solver. In my experience you will want to use a KSP direct solve, for example just -ksp_type preonly -pc_type lu; using an iterative solver in such Schur complement is always way to slow. Barry > > > I would like very much to have a look at your implementation, and I > think that will be very useful to me. > > Thanks > > Waad > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber > wrote: > > 1) How do you actually get the local Schur complements. You > > explicitelly compute its entries, or do you compute it after > computing > > the inverse (or LU factors) of a 'local' matrix? > > > > I construct the local Schur complement matrices after getting the > inversion > > of A_II matrix for each subdomain. > > Fine, > > > 2) Your R_i matrix is actually a matrix? In that case, it is a > trivial > > restrinction operation with ones and zeros? Or R_i is actually a > > VecScatter? > > > > R_i is the restriction matrix maps the global boundary nodes to > the local > > boundary nodes and its entries is zero and one I store it as spare > matrix, > > so only I need to store the nonzero entries which one entry per a > row > > I believe a VecScatter will perform much better for this task. > > > > And finally: are you trying to apply a Krylov method over the global > > Schur complement? In such a case, are you going to implement a > > preconditioner for it? > > > > Yes, that what I am trying to do > > Well, please let me make some comments. I've spent many days and month > optimizing Schur complement iterations, and I ended giving up. I was > never able to get it perform better than ASM preconditioner (iff > appropriatelly used, ie. solving local problems with LU, and > implementing subdomain subpartitioning the smart way, not the way > currently implemented in PETSc, were subpartitioning is done by chunks > of continuous rows). > > If you are doing research on this, I would love to know your > conclusion when you get your work done. If you are doing all this just > with the hope of getting better running times, well, remember my above > comments but also remember that I do not consider myself a smart guy > ;-) > > As I said before, I worked hard for implementing general Schur > complement iteration. All this code is avalable in the SVN repository > of petsc4py (PETSc for Python), but it could be easily stripped out > for use in any PETSc-based code in C/C++. This implementation requires > the use of a MATIS matrix type (there is also a separate > implementation for MATMPIAIJ maatrices), I've implemented subdomain > subpartitioning (using a simple recursive graph splitting procedure > reusing matrix reordering routines built-in in PETSc, could be done > better with METIS); when the A_ii problems are large, their LU > factorization can be a real bootleneck. I've even implemented a > global preconditioner operation for the interface problem, based on > iterating over a 'strip' of nodes around the interface; it improves > convergence and is usefull for ill-conditioned systems, but the costs > are increased. > > If you ever want to take a look at my implemention for try to use it, > or perhaps take ideas for your own implementation, let me know. > > > > > > > > Now having the Schur complement matrix for each subdomain, I > need to solve > > > the interface problem > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > > .. i=1.. to No. of process (subdomains) in parallel. > > > > > > For the global vector I construct one MPI vector and use > VecGetArray () > > for > > > each of the sequential vector then use VecSetValues () to add > the values > > > into the global MPI vector. That works fine. > > > > > > However for the global schur complement matix I try the same > idea by > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > > MatSetValues () in order to add the values to the global matrix. > > > MatGetArray( ) gives me only the values without indices, so I > don't know > > how > > > to add these valuse to the global MPI matrix. > > > > > > Thanks agin > > > > > > Waad > > > > > > Barry Smith wrote: > > > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > > > Thank you Matt, > > > > > > > > Any suggestion to solve the problem I am trying to tackle. I > want to > > > > solve a linear system: > > > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > > vector. Each CPU has one matrix and one vector of the same > size. Now > > > > I want to sum up and solve the system in parallel. > > > > > > Does each A_i have nonzero entries (mostly) associated with one > > > part of the matrix? Or does each process have values > > > scattered all around the matrix? > > > > > > In the former case you should simply create one parallel MPIAIJ > > > matrix and call MatSetValues() to put the values > > > into it. We don't have any kind of support for the later case, > perhaps > > > if you describe how the matrix entries come about someone > > > would have suggestions on how to proceed. > > > > > > Barry > > > > > > > > > > > > > > > Thanks again > > > > > > > > Waad > > > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > > 2:12 PM, Waad Subber wrote: > > > > > Hi, > > > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > > adding up > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt > m,PetscInt > > > > n,MatReuse > > > > > scall,Mat *mpimat) > > > > > > > > > > to do that. However, when I compile the code I get the > following > > > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > > collect2: ld returned 1 exit status > > > > > make: *** [all] Error 1 > > > > > > > > > > Am I using this function correctly ? > > > > > > > > These have no Fortran bindings right now. > > > > > > > > Matt > > > > > > > > > Thanks > > > > > > > > > > Waad > > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to > which > > > > their experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a > (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica > (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Fri May 23 13:45:22 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 23 May 2008 15:45:22 -0300 Subject: MatMerge_SeqsToMPI In-Reply-To: <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov> References: <212555.70047.qm@web38205.mail.mud.yahoo.com> <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov> Message-ID: On 5/23/08, Barry Smith wrote: > I notice Lapack solver for the interior > problem takes a lot of time compare to the iterative solver for the > interface, so now I am replacing the direct factorization with petsc KSP > solver. > > > > In my experience you will want to use a KSP direct solve, for example > just -ksp_type preonly -pc_type lu; using an iterative solver in such Schur > complement is > always way to slow. Indeed!! Waad, do not do that. If your interior problem is big, the only way to go is to implement subdomain subpartitioning. That is, in your local subdomain, you have to select some DOF's and 'mark' them as 'interface' nodes. Then you end-up having many A_ii's at a local subdomain, each corresponding with a sub-subdomain... Of course, implementing all this logic is not trivial at all.. My implementation of all this stuff can be found in petsc4py SVN repository hosted at Google Code. Here you have the (long) link: http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/ In directory 'schur' you have a version requiring MATIS matrix type (this is somewhat natural in the context of substructuring and finite elements methods). This corresponds to what Y.Sadd's book calls 'edge-based' partitionings. In directory 'schur_aij' hou have a version (never carefully tested) working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's book calls 'vertex-based' partitionings (more typical to appear in finite difference methods). > > I would like very much to have a look at your implementation, and I think > that will be very useful to me. > > > > Thanks > > > > Waad > > > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote: > > > 1) How do you actually get the local Schur complements. You > > > explicitelly compute its entries, or do you compute it after computing > > > the inverse (or LU factors) of a 'local' matrix? > > > > > > I construct the local Schur complement matrices after getting the > inversion > > > of A_II matrix for each subdomain. > > > > Fine, > > > > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial > > > restrinction operation with ones and zeros? Or R_i is actually a > > > VecScatter? > > > > > > R_i is the restriction matrix maps the global boundary nodes to the > local > > > boundary nodes and its entries is zero and one I store it as spare > matrix, > > > so only I need to store the nonzero entries which one entry per a row > > > > I believe a VecScatter will perform much better for this task. > > > > > > > And finally: are you trying to apply a Krylov method over the global > > > Schur complement? In such a case, are you going to implement a > > > preconditioner for it? > > > > > > Yes, that what I am trying to do > > > > Well, please let me make some comments. I've spent many days and month > > optimizing Schur complement iterations, and I ended giving up. I was > > never able to get it perform better than ASM preconditioner (iff > > appropriatelly used, ie. solving local problems with LU, and > > implementing subdomain subpartitioning the smart way, not the way > > currently implemented in PETSc, were subpartitioning is done by chunks > > of continuous rows). > > > > If you are doing research on this, I would love to know your > > conclusion when you get your work done. If you are doing all this just > > with the hope of getting better running times, well, remember my above > > comments but also remember that I do not consider myself a smart guy > > ;-) > > > > As I said before, I worked hard for implementing general Schur > > complement iteration. All this code is avalable in the SVN repository > > of petsc4py (PETSc for Python), but it could be easily stripped out > > for use in any PETSc-based code in C/C++. This implementation requires > > the use of a MATIS matrix type (there is also a separate > > implementation for MATMPIAIJ maatrices), I've implemented subdomain > > subpartitioning (using a simple recursive graph splitting procedure > > reusing matrix reordering routines built-in in PETSc, could be done > > better with METIS); when the A_ii problems are large, their LU > > factorization can be a real bootleneck. I've even implemented a > > global preconditioner operation for the interface problem, based on > > iterating over a 'strip' of nodes around the interface; it improves > > convergence and is usefull for ill-conditioned systems, but the costs > > are increased. > > > > If you ever want to take a look at my implemention for try to use it, > > or perhaps take ideas for your own implementation, let me know. > > > > > > > > > > > > > > Now having the Schur complement matrix for each subdomain, I need to > solve > > > > the interface problem > > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > > > .. i=1.. to No. of process (subdomains) in parallel. > > > > > > > > For the global vector I construct one MPI vector and use VecGetArray > () > > > for > > > > each of the sequential vector then use VecSetValues () to add the > values > > > > into the global MPI vector. That works fine. > > > > > > > > However for the global schur complement matix I try the same idea by > > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > > > MatSetValues () in order to add the values to the global matrix. > > > > MatGetArray( ) gives me only the values without indices, so I don't > know > > > how > > > > to add these valuse to the global MPI matrix. > > > > > > > > Thanks agin > > > > > > > > Waad > > > > > > > > Barry Smith wrote: > > > > > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > > > > > Thank you Matt, > > > > > > > > > > Any suggestion to solve the problem I am trying to tackle. I want to > > > > > solve a linear system: > > > > > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > > > vector. Each CPU has one matrix and one vector of the same size. Now > > > > > I want to sum up and solve the system in parallel. > > > > > > > > Does each A_i have nonzero entries (mostly) associated with one > > > > part of the matrix? Or does each process have values > > > > scattered all around the matrix? > > > > > > > > In the former case you should simply create one parallel MPIAIJ > > > > matrix and call MatSetValues() to put the values > > > > into it. We don't have any kind of support for the later case, perhaps > > > > if you describe how the matrix entries come about someone > > > > would have suggestions on how to proceed. > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > Thanks again > > > > > > > > > > Waad > > > > > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > > > 2:12 PM, Waad Subber wrote: > > > > > > Hi, > > > > > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > > > adding up > > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > > > > n,MatReuse > > > > > > scall,Mat *mpimat) > > > > > > > > > > > > to do that. However, when I compile the code I get the following > > > > > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > > > collect2: ld returned 1 exit status > > > > > > make: *** [all] Error 1 > > > > > > > > > > > > Am I using this function correctly ? > > > > > > > > > > These have no Fortran bindings right now. > > > > > > > > > > Matt > > > > > > > > > > > Thanks > > > > > > > > > > > > Waad > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before they begin their > > > > > experiments is infinitely more interesting than any results to which > > > > > their experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Lisandro Dalc?n > > > --------------- > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > > > > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From w_subber at yahoo.com Fri May 23 13:59:24 2008 From: w_subber at yahoo.com (Waad Subber) Date: Fri, 23 May 2008 11:59:24 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: Message-ID: <321455.84470.qm@web38207.mail.mud.yahoo.com> Thank you Lisandro, I like the idea of constructing Schur complement of Schur complement matrix. I will give it a try. Thanks again for the suggestions and the link Waad :) Lisandro Dalcin wrote: On 5/23/08, Barry Smith wrote: > I notice Lapack solver for the interior > problem takes a lot of time compare to the iterative solver for the > interface, so now I am replacing the direct factorization with petsc KSP > solver. > > > > In my experience you will want to use a KSP direct solve, for example > just -ksp_type preonly -pc_type lu; using an iterative solver in such Schur > complement is > always way to slow. Indeed!! Waad, do not do that. If your interior problem is big, the only way to go is to implement subdomain subpartitioning. That is, in your local subdomain, you have to select some DOF's and 'mark' them as 'interface' nodes. Then you end-up having many A_ii's at a local subdomain, each corresponding with a sub-subdomain... Of course, implementing all this logic is not trivial at all.. My implementation of all this stuff can be found in petsc4py SVN repository hosted at Google Code. Here you have the (long) link: http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/ In directory 'schur' you have a version requiring MATIS matrix type (this is somewhat natural in the context of substructuring and finite elements methods). This corresponds to what Y.Sadd's book calls 'edge-based' partitionings. In directory 'schur_aij' hou have a version (never carefully tested) working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's book calls 'vertex-based' partitionings (more typical to appear in finite difference methods). > > I would like very much to have a look at your implementation, and I think > that will be very useful to me. > > > > Thanks > > > > Waad > > > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote: > > > 1) How do you actually get the local Schur complements. You > > > explicitelly compute its entries, or do you compute it after computing > > > the inverse (or LU factors) of a 'local' matrix? > > > > > > I construct the local Schur complement matrices after getting the > inversion > > > of A_II matrix for each subdomain. > > > > Fine, > > > > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial > > > restrinction operation with ones and zeros? Or R_i is actually a > > > VecScatter? > > > > > > R_i is the restriction matrix maps the global boundary nodes to the > local > > > boundary nodes and its entries is zero and one I store it as spare > matrix, > > > so only I need to store the nonzero entries which one entry per a row > > > > I believe a VecScatter will perform much better for this task. > > > > > > > And finally: are you trying to apply a Krylov method over the global > > > Schur complement? In such a case, are you going to implement a > > > preconditioner for it? > > > > > > Yes, that what I am trying to do > > > > Well, please let me make some comments. I've spent many days and month > > optimizing Schur complement iterations, and I ended giving up. I was > > never able to get it perform better than ASM preconditioner (iff > > appropriatelly used, ie. solving local problems with LU, and > > implementing subdomain subpartitioning the smart way, not the way > > currently implemented in PETSc, were subpartitioning is done by chunks > > of continuous rows). > > > > If you are doing research on this, I would love to know your > > conclusion when you get your work done. If you are doing all this just > > with the hope of getting better running times, well, remember my above > > comments but also remember that I do not consider myself a smart guy > > ;-) > > > > As I said before, I worked hard for implementing general Schur > > complement iteration. All this code is avalable in the SVN repository > > of petsc4py (PETSc for Python), but it could be easily stripped out > > for use in any PETSc-based code in C/C++. This implementation requires > > the use of a MATIS matrix type (there is also a separate > > implementation for MATMPIAIJ maatrices), I've implemented subdomain > > subpartitioning (using a simple recursive graph splitting procedure > > reusing matrix reordering routines built-in in PETSc, could be done > > better with METIS); when the A_ii problems are large, their LU > > factorization can be a real bootleneck. I've even implemented a > > global preconditioner operation for the interface problem, based on > > iterating over a 'strip' of nodes around the interface; it improves > > convergence and is usefull for ill-conditioned systems, but the costs > > are increased. > > > > If you ever want to take a look at my implemention for try to use it, > > or perhaps take ideas for your own implementation, let me know. > > > > > > > > > > > > > > Now having the Schur complement matrix for each subdomain, I need to > solve > > > > the interface problem > > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > > > .. i=1.. to No. of process (subdomains) in parallel. > > > > > > > > For the global vector I construct one MPI vector and use VecGetArray > () > > > for > > > > each of the sequential vector then use VecSetValues () to add the > values > > > > into the global MPI vector. That works fine. > > > > > > > > However for the global schur complement matix I try the same idea by > > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > > > MatSetValues () in order to add the values to the global matrix. > > > > MatGetArray( ) gives me only the values without indices, so I don't > know > > > how > > > > to add these valuse to the global MPI matrix. > > > > > > > > Thanks agin > > > > > > > > Waad > > > > > > > > Barry Smith wrote: > > > > > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > > > > > Thank you Matt, > > > > > > > > > > Any suggestion to solve the problem I am trying to tackle. I want to > > > > > solve a linear system: > > > > > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > > > vector. Each CPU has one matrix and one vector of the same size. Now > > > > > I want to sum up and solve the system in parallel. > > > > > > > > Does each A_i have nonzero entries (mostly) associated with one > > > > part of the matrix? Or does each process have values > > > > scattered all around the matrix? > > > > > > > > In the former case you should simply create one parallel MPIAIJ > > > > matrix and call MatSetValues() to put the values > > > > into it. We don't have any kind of support for the later case, perhaps > > > > if you describe how the matrix entries come about someone > > > > would have suggestions on how to proceed. > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > Thanks again > > > > > > > > > > Waad > > > > > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > > > 2:12 PM, Waad Subber wrote: > > > > > > Hi, > > > > > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > > > adding up > > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > > > > n,MatReuse > > > > > > scall,Mat *mpimat) > > > > > > > > > > > > to do that. However, when I compile the code I get the following > > > > > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > > > collect2: ld returned 1 exit status > > > > > > make: *** [all] Error 1 > > > > > > > > > > > > Am I using this function correctly ? > > > > > > > > > > These have no Fortran bindings right now. > > > > > > > > > > Matt > > > > > > > > > > > Thanks > > > > > > > > > > > > Waad > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before they begin their > > > > > experiments is infinitely more interesting than any results to which > > > > > their experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Lisandro Dalc?n > > > --------------- > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > > > > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Fri May 23 15:04:51 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 23 May 2008 17:04:51 -0300 Subject: MatMerge_SeqsToMPI In-Reply-To: <321455.84470.qm@web38207.mail.mud.yahoo.com> References: <321455.84470.qm@web38207.mail.mud.yahoo.com> Message-ID: On 5/23/08, Waad Subber wrote: > Thank you Lisandro, > > I like the idea of constructing Schur complement of Schur complement matrix. > I will give it a try. Mmm, I was not actually talking about a Schur complemt of a Schur complement. In fact, I suggested that the interface nodes have to be extended with carefully selected interior nodes. This way, all this is equivalent to solving a problem with, let say, 1000 'logical' subdomains, but using let say 100 processors, were the actual local subdomains are splited to have about 10 subdomains per processor. Am I being clear enough? > > Thanks again for the suggestions and the link Enjoy! And please do not blame me for such a contrived code! > > Lisandro Dalcin wrote: > On 5/23/08, Barry Smith wrote: > > I notice Lapack solver for the interior > > problem takes a lot of time compare to the iterative solver for the > > interface, so now I am replacing the direct factorization with petsc KSP > > solver. > > > > > > > In my experience you will want to use a KSP direct solve, for example > > just -ksp_type preonly -pc_type lu; using an iterative solver in such > Schur > > complement is > > always way to slow. > > Indeed!! > > Waad, do not do that. If your interior problem is big, the only way to > go is to implement subdomain subpartitioning. That is, in your local > subdomain, you have to select some DOF's and 'mark' them as > 'interface' nodes. Then you end-up having many A_ii's at a local > subdomain, each corresponding with a sub-subdomain... Of course, > implementing all this logic is not trivial at all.. > > My implementation of all this stuff can be found in petsc4py SVN > repository hosted at Google Code. Here you have the (long) link: > > http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/ > > > In directory 'schur' you have a version requiring MATIS matrix type > (this is somewhat natural in the context of substructuring and finite > elements methods). This corresponds to what Y.Sadd's book calls > 'edge-based' partitionings. > > In directory 'schur_aij' hou have a version (never carefully tested) > working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's > book calls 'vertex-based' partitionings (more typical to appear in > finite difference methods). > > > > > > > > > I would like very much to have a look at your implementation, and I > think > > that will be very useful to me. > > > > > > Thanks > > > > > > Waad > > > > > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote: > > > > 1) How do you actually get the local Schur complements. You > > > > explicitelly compute its entries, or do you compute it after computing > > > > the inverse (or LU factors) of a 'local' matrix? > > > > > > > > I construct the local Schur complement matrices after getting the > > inversion > > > > of A_II matrix for each subdomain. > > > > > > Fine, > > > > > > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial > > > > restrinction operation with ones and zeros? Or R_i is actually a > > > > VecScatter? > > > > > > > > R_i is the restriction matrix maps the global boundary nodes to the > > local > > > > boundary nodes and its entries is zero and one I store it as spare > > matrix, > > > > so only I need to store the nonzero entries which one entry per a row > > > > > > I believe a VecScatter will perform much better for this task. > > > > > > > > > > And finally: are you trying to apply a Krylov method over the global > > > > Schur complement? In such a case, are you going to implement a > > > > preconditioner for it? > > > > > > > > Yes, that what I am trying to do > > > > > > Well, please let me make some comments. I've spent many days and month > > > optimizing Schur complement iterations, and I ended giving up. I was > > > never able to get it perform better than ASM preconditioner (iff > > > appropriatelly used, ie. solving local problems with LU, and > > > implementing subdomain subpartitioning the smart way, not the way > > > currently implemented in PETSc, were subpartitioning is done by chunks > > > of continuous rows). > > > > > > If you are doing research on this, I would love to know your > > > conclusion when you get your work done. If you are doing all this just > > > with the hope of getting better running times, well, remember my above > > > comments but also remember that I do not consider myself a smart guy > > > ;-) > > > > > > As I said before, I worked hard for implementing general Schur > > > complement iteration. All this code is avalable in the SVN repository > > > of petsc4py (PETSc for Python), but it could be easily stripped out > > > for use in any PETSc-based code in C/C++. This implementation requires > > > the use of a MATIS matrix type (there is also a separate > > > implementation for MATMPIAIJ maatrices), I've implemented subdomain > > > subpartitioning (using a simple recursive graph splitting procedure > > > reusing matrix reordering routines built-in in PETSc, could be done > > > better with METIS); when the A_ii problems are large, their LU > > > factorization can be a real bootleneck. I've even implemented a > > > global preconditioner operation for the interface problem, based on > > > iterating over a 'strip' of nodes around the interface; it improves > > > convergence and is usefull for ill-conditioned systems, but the costs > > > are increased. > > > > > > If you ever want to take a look at my implemention for try to use it, > > > or perhaps take ideas for your own implementation, let me know. > > > > > > > > > > > > > > > > > > > > Now having the Schur complement matrix for each subdomain, I need to > > solve > > > > > the interface problem > > > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > > > > .. i=1.. to No. of process (subdomains) in parallel. > > > > > > > > > > For the global vector I construct one MPI vector and use VecGetArray > > () > > > > for > > > > > each of the sequential vector then use VecSetValues () to add the > > values > > > > > into the global MPI vector. That works fine. > > > > > > > > > > However for the global schur complement matix I try the same idea by > > > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > > > > MatSetValues () in order to add the values to the global matrix. > > > > > MatGetArray( ) gives me only the values without indices, so I don't > > know > > > > how > > > > > to add these valuse to the global MPI matrix. > > > > > > > > > > Thanks agin > > > > > > > > > > Waad > > > > > > > > > > Barry Smith wrote: > > > > > > > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > > > > > > > Thank you Matt, > > > > > > > > > > > > Any suggestion to solve the problem I am trying to tackle. I want > to > > > > > > solve a linear system: > > > > > > > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > > > > vector. Each CPU has one matrix and one vector of the same size. > Now > > > > > > I want to sum up and solve the system in parallel. > > > > > > > > > > Does each A_i have nonzero entries (mostly) associated with one > > > > > part of the matrix? Or does each process have values > > > > > scattered all around the matrix? > > > > > > > > > > In the former case you should simply create one parallel MPIAIJ > > > > > matrix and call MatSetValues() to put the values > > > > > into it. We don't have any kind of support for the later case, > perhaps > > > > > if you describe how the matrix entries come about someone > > > > > would have suggestions on how to proceed. > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > > > Thanks again > > > > > > > > > > > > Waad > > > > > > > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > > > > 2:12 PM, Waad Subber wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > > > > adding up > > > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > > > > > n,MatReuse > > > > > > > scall,Mat *mpimat) > > > > > > > > > > > > > > to do that. However, when I compile the code I get the following > > > > > > > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > > > > collect2: ld returned 1 exit status > > > > > > > make: *** [all] Error 1 > > > > > > > > > > > > > > Am I using this function correctly ? > > > > > > > > > > > > These have no Fortran bindings right now. > > > > > > > > > > > > Matt > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Waad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments is infinitely more interesting than any results to > which > > > > > > their experiments lead. > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Lisandro Dalc?n > > > > --------------- > > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Lisandro Dalc?n > > > --------------- > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > > > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From w_subber at yahoo.com Fri May 23 15:30:40 2008 From: w_subber at yahoo.com (Waad Subber) Date: Fri, 23 May 2008 13:30:40 -0700 (PDT) Subject: MatMerge_SeqsToMPI In-Reply-To: Message-ID: <790235.898.qm@web38206.mail.mud.yahoo.com> I see what do you mean Thanks :) Lisandro Dalcin wrote: On 5/23/08, Waad Subber wrote: > Thank you Lisandro, > > I like the idea of constructing Schur complement of Schur complement matrix. > I will give it a try. Mmm, I was not actually talking about a Schur complemt of a Schur complement. In fact, I suggested that the interface nodes have to be extended with carefully selected interior nodes. This way, all this is equivalent to solving a problem with, let say, 1000 'logical' subdomains, but using let say 100 processors, were the actual local subdomains are splited to have about 10 subdomains per processor. Am I being clear enough? > > Thanks again for the suggestions and the link Enjoy! And please do not blame me for such a contrived code! > > Lisandro Dalcin wrote: > On 5/23/08, Barry Smith wrote: > > I notice Lapack solver for the interior > > problem takes a lot of time compare to the iterative solver for the > > interface, so now I am replacing the direct factorization with petsc KSP > > solver. > > > > > > > In my experience you will want to use a KSP direct solve, for example > > just -ksp_type preonly -pc_type lu; using an iterative solver in such > Schur > > complement is > > always way to slow. > > Indeed!! > > Waad, do not do that. If your interior problem is big, the only way to > go is to implement subdomain subpartitioning. That is, in your local > subdomain, you have to select some DOF's and 'mark' them as > 'interface' nodes. Then you end-up having many A_ii's at a local > subdomain, each corresponding with a sub-subdomain... Of course, > implementing all this logic is not trivial at all.. > > My implementation of all this stuff can be found in petsc4py SVN > repository hosted at Google Code. Here you have the (long) link: > > http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/ > > > In directory 'schur' you have a version requiring MATIS matrix type > (this is somewhat natural in the context of substructuring and finite > elements methods). This corresponds to what Y.Sadd's book calls > 'edge-based' partitionings. > > In directory 'schur_aij' hou have a version (never carefully tested) > working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's > book calls 'vertex-based' partitionings (more typical to appear in > finite difference methods). > > > > > > > > > I would like very much to have a look at your implementation, and I > think > > that will be very useful to me. > > > > > > Thanks > > > > > > Waad > > > > > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote: > > > > 1) How do you actually get the local Schur complements. You > > > > explicitelly compute its entries, or do you compute it after computing > > > > the inverse (or LU factors) of a 'local' matrix? > > > > > > > > I construct the local Schur complement matrices after getting the > > inversion > > > > of A_II matrix for each subdomain. > > > > > > Fine, > > > > > > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial > > > > restrinction operation with ones and zeros? Or R_i is actually a > > > > VecScatter? > > > > > > > > R_i is the restriction matrix maps the global boundary nodes to the > > local > > > > boundary nodes and its entries is zero and one I store it as spare > > matrix, > > > > so only I need to store the nonzero entries which one entry per a row > > > > > > I believe a VecScatter will perform much better for this task. > > > > > > > > > > And finally: are you trying to apply a Krylov method over the global > > > > Schur complement? In such a case, are you going to implement a > > > > preconditioner for it? > > > > > > > > Yes, that what I am trying to do > > > > > > Well, please let me make some comments. I've spent many days and month > > > optimizing Schur complement iterations, and I ended giving up. I was > > > never able to get it perform better than ASM preconditioner (iff > > > appropriatelly used, ie. solving local problems with LU, and > > > implementing subdomain subpartitioning the smart way, not the way > > > currently implemented in PETSc, were subpartitioning is done by chunks > > > of continuous rows). > > > > > > If you are doing research on this, I would love to know your > > > conclusion when you get your work done. If you are doing all this just > > > with the hope of getting better running times, well, remember my above > > > comments but also remember that I do not consider myself a smart guy > > > ;-) > > > > > > As I said before, I worked hard for implementing general Schur > > > complement iteration. All this code is avalable in the SVN repository > > > of petsc4py (PETSc for Python), but it could be easily stripped out > > > for use in any PETSc-based code in C/C++. This implementation requires > > > the use of a MATIS matrix type (there is also a separate > > > implementation for MATMPIAIJ maatrices), I've implemented subdomain > > > subpartitioning (using a simple recursive graph splitting procedure > > > reusing matrix reordering routines built-in in PETSc, could be done > > > better with METIS); when the A_ii problems are large, their LU > > > factorization can be a real bootleneck. I've even implemented a > > > global preconditioner operation for the interface problem, based on > > > iterating over a 'strip' of nodes around the interface; it improves > > > convergence and is usefull for ill-conditioned systems, but the costs > > > are increased. > > > > > > If you ever want to take a look at my implemention for try to use it, > > > or perhaps take ideas for your own implementation, let me know. > > > > > > > > > > > > > > > > > > > > Now having the Schur complement matrix for each subdomain, I need to > > solve > > > > > the interface problem > > > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], > > > > > .. i=1.. to No. of process (subdomains) in parallel. > > > > > > > > > > For the global vector I construct one MPI vector and use VecGetArray > > () > > > > for > > > > > each of the sequential vector then use VecSetValues () to add the > > values > > > > > into the global MPI vector. That works fine. > > > > > > > > > > However for the global schur complement matix I try the same idea by > > > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and > > > > > MatSetValues () in order to add the values to the global matrix. > > > > > MatGetArray( ) gives me only the values without indices, so I don't > > know > > > > how > > > > > to add these valuse to the global MPI matrix. > > > > > > > > > > Thanks agin > > > > > > > > > > Waad > > > > > > > > > > Barry Smith wrote: > > > > > > > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote: > > > > > > > > > > > Thank you Matt, > > > > > > > > > > > > Any suggestion to solve the problem I am trying to tackle. I want > to > > > > > > solve a linear system: > > > > > > > > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. > > > > > > > > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential > > > > > > vector. Each CPU has one matrix and one vector of the same size. > Now > > > > > > I want to sum up and solve the system in parallel. > > > > > > > > > > Does each A_i have nonzero entries (mostly) associated with one > > > > > part of the matrix? Or does each process have values > > > > > scattered all around the matrix? > > > > > > > > > > In the former case you should simply create one parallel MPIAIJ > > > > > matrix and call MatSetValues() to put the values > > > > > into it. We don't have any kind of support for the later case, > perhaps > > > > > if you describe how the matrix entries come about someone > > > > > would have suggestions on how to proceed. > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > > > Thanks again > > > > > > > > > > > > Waad > > > > > > > > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at > > > > > > 2:12 PM, Waad Subber wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by > > > > > > adding up > > > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using > > > > > > > > > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt > > > > > > n,MatReuse > > > > > > > scall,Mat *mpimat) > > > > > > > > > > > > > > to do that. However, when I compile the code I get the following > > > > > > > > > > > > > > undefined reference to `matmerge_seqstompi_' > > > > > > > collect2: ld returned 1 exit status > > > > > > > make: *** [all] Error 1 > > > > > > > > > > > > > > Am I using this function correctly ? > > > > > > > > > > > > These have no Fortran bindings right now. > > > > > > > > > > > > Matt > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Waad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments is infinitely more interesting than any results to > which > > > > > > their experiments lead. > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Lisandro Dalc?n > > > > --------------- > > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Lisandro Dalc?n > > > --------------- > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > > Tel/Fax: +54-(0)342-451.1594 > > > > > > > > > > > > > > > > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Sun May 25 05:43:19 2008 From: zonexo at gmail.com (Ben Tay) Date: Sun, 25 May 2008 18:43:19 +0800 Subject: Comparing matrices between 2 different codes and viewing of matrix Message-ID: <483942C7.7080705@gmail.com> Hi, I have an old serial code and a newer parallel code. The new parallel code is converted from the old serial code. However, due to numerous changes, the answers from the new code now differs from the old one after the 1st step. What is the best way to compare the matrices from the 2 different code? I guess the most direct mtd is to use MatView to store the matrix in a ACSII file and spot the difference between the 2 files. However, I can't seem to get it right. What I did is: PetscViewer viewer call PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr) call MatView(A_mat_uv,viewer,ierr) call PetscViewerDestroy(viewer,ierr) call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr) However, I get the error that "PetscViewer viewer" has syntax error. Hope you can help me out. Thank you very much. Regards From bsmith at mcs.anl.gov Sun May 25 08:55:12 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 25 May 2008 08:55:12 -0500 Subject: Comparing matrices between 2 different codes and viewing of matrix In-Reply-To: <483942C7.7080705@gmail.com> References: <483942C7.7080705@gmail.com> Message-ID: <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> Likely you forgot to #include "finclude/petscviewer.h" in the Fortran subroutine that is doing this stuff. On May 25, 2008, at 5:43 AM, Ben Tay wrote: > Hi, > > I have an old serial code and a newer parallel code. The new > parallel code is converted from the old serial code. However, due to > numerous changes, the answers from the new code now differs from the > old one after the 1st step. What is the best way to compare the > matrices from the 2 different code? > > I guess the most direct mtd is to use MatView to store the matrix in > a ACSII file and spot the difference between the 2 files. However, I > can't seem to get it right. What I did is: > > PetscViewer viewer > > call PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr) > > call MatView(A_mat_uv,viewer,ierr) > > call PetscViewerDestroy(viewer,ierr) > > call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr) > > However, I get the error that "PetscViewer viewer" has syntax error. > Hope you can help me out. > > Thank you very much. > > Regards > > From dalcinl at gmail.com Mon May 26 10:47:11 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 26 May 2008 12:47:11 -0300 Subject: Comparing matrices between 2 different codes and viewing of matrix In-Reply-To: <483942C7.7080705@gmail.com> References: <483942C7.7080705@gmail.com> Message-ID: On 5/25/08, Ben Tay wrote: > What is the best way to compare the matrices from the 2 different code? > > I guess the most direct mtd is to use MatView to store the matrix in a > ACSII file and spot the difference between the 2 files. However, I can't > seem to get it right. What I did is: But then take into account that in the parallel case, the numbering of the rows/cols of the matrix will perhaps not match the one of the sequential case. If you save the parallel matrix, you will also need to save an appropriate permutation to be able to actually compare the matrices. Or perhaps better, a combination of an application ordering, an index set, and MatPermute(). -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From zonexo at gmail.com Tue May 27 05:57:08 2008 From: zonexo at gmail.com (Ben Tay) Date: Tue, 27 May 2008 18:57:08 +0800 Subject: Comparing matrices between 2 different codes and viewing of matrix In-Reply-To: <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> References: <483942C7.7080705@gmail.com> <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> Message-ID: <483BE904.4090608@gmail.com> Hi Barry, Thanks for pointing out. However, I only got a 0 byte file after adding the include statement. I am programming in parallel and my matrix is created using MatCreateMPIAIJ. Did I missed out some commands? PetscViewer viewer ... call PetscViewerCreate(MPI_COMM_WORLD,viewer,ierr) call MatView(A_mat_uv,viewer,ierr) call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr) call PetscViewerDestroy(viewer,ierr) Thank you very much. Regards. Barry Smith wrote: > > Likely you forgot to #include "finclude/petscviewer.h" in the > Fortran subroutine that > is doing this stuff. > > > On May 25, 2008, at 5:43 AM, Ben Tay wrote: > >> Hi, >> >> I have an old serial code and a newer parallel code. The new parallel >> code is converted from the old serial code. However, due to numerous >> changes, the answers from the new code now differs from the old one >> after the 1st step. What is the best way to compare the matrices from >> the 2 different code? >> >> I guess the most direct mtd is to use MatView to store the matrix in >> a ACSII file and spot the difference between the 2 files. However, I >> can't seem to get it right. What I did is: >> >> PetscViewer viewer >> >> call PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr) >> >> call MatView(A_mat_uv,viewer,ierr) >> >> call PetscViewerDestroy(viewer,ierr) >> >> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr) >> >> However, I get the error that "PetscViewer viewer" has syntax error. >> Hope you can help me out. >> >> Thank you very much. >> >> Regards >> >> > > From bsmith at mcs.anl.gov Tue May 27 07:52:01 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 27 May 2008 07:52:01 -0500 Subject: Comparing matrices between 2 different codes and viewing of matrix In-Reply-To: <483BE904.4090608@gmail.com> References: <483942C7.7080705@gmail.com> <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> <483BE904.4090608@gmail.com> Message-ID: You use EITHER PetscViewerCreate() and PetscViewerSetType() and PetscViewerFileSetName() OR PetscViewerASCIIOpen() so use > call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr) >> call MatView(A_mat_uv,viewer,ierr) > call PetscViewerDestroy(viewer,ierr) On May 27, 2008, at 5:57 AM, Ben Tay wrote: > Hi Barry, > > Thanks for pointing out. However, I only got a 0 byte file after > adding the include statement. > > I am programming in parallel and my matrix is created using > MatCreateMPIAIJ. Did I missed out some commands? > > PetscViewer viewer > > > ... > > call PetscViewerCreate(MPI_COMM_WORLD,viewer,ierr) > > call MatView(A_mat_uv,viewer,ierr) > > call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr) > > call PetscViewerDestroy(viewer,ierr) > > Thank you very much. > > Regards. > > > Barry Smith wrote: >> >> Likely you forgot to #include "finclude/petscviewer.h" in the >> Fortran subroutine that >> is doing this stuff. >> >> >> On May 25, 2008, at 5:43 AM, Ben Tay wrote: >> >>> Hi, >>> >>> I have an old serial code and a newer parallel code. The new >>> parallel code is converted from the old serial code. However, due >>> to numerous changes, the answers from the new code now differs >>> from the old one after the 1st step. What is the best way to >>> compare the matrices from the 2 different code? >>> >>> I guess the most direct mtd is to use MatView to store the matrix >>> in a ACSII file and spot the difference between the 2 files. >>> However, I can't seem to get it right. What I did is: >>> >>> PetscViewer viewer >>> >>> call PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr) >>> >>> call MatView(A_mat_uv,viewer,ierr) >>> >>> call PetscViewerDestroy(viewer,ierr) >>> >>> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr) >>> >>> However, I get the error that "PetscViewer viewer" has syntax >>> error. Hope you can help me out. >>> >>> Thank you very much. >>> >>> Regards >>> >>> >> >> > > From Amit.Itagi at seagate.com Tue May 27 10:18:40 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Tue, 27 May 2008 11:18:40 -0400 Subject: Code structuring - Communicator In-Reply-To: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov> Message-ID: Barry, I got a part of what I was trying to do (sub-communicator etc.), working. Now suppose I want to repeat a calculation with a different input, I have two ways of doing it (based on what I have coded). 1) MPI_Initialize Create a group using MPI_Comm_group Create several sub-groups and sub-communicators using MPI_Group_Incl and MPI_Comm_create Assign the sub-communicator to PETSC_COMM_WORLD // Calculation 1 { Do PetscInitialize Perform the calculation Do PetscFinalize } // Calculation 2 { Do PetscInitialize Perform the calculation Do PetscFinalize } Do MPI_finalize 2) MPI_Initialize Create a group using MPI_Comm_group Create several sub-groups and sub-communicators using MPI_Group_Incl and MPI_Comm_create Assign the sub-communicator to PETSC_COMM_WORLD Do PetscInitialize // Calculation 1 { Perform the calculation } // Calculation 2 { Perform the calculation } Do PetscFinalize Do MPI_finalize The first method crashes. I am trying to understand why. The documentation says that PetscFinalize calls MPI_finalize only if MPI_Initialize is not called before PetscInitialize. In my case, is PetscFinalize destroying the sub-communicators ? Thanks Rgds, Amit Barry Smith To Sent by: petsc-users at mcs.anl.gov owner-petsc-users cc @mcs.anl.gov No Phone Info Subject Available Re: Code structuring - Communicator 05/09/2008 03:07 PM Please respond to petsc-users at mcs.a nl.gov There are many ways to do this, most of them involve using MPI to construct subcommunicators for the various sub parallel tasks. You very likely want to keep PetscInitialize() at the very beginning of the program; you would not write the calls in terms of PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the subcommunicators to create the objects. An alternative approach is to look at the manual page for PetscOpenMPMerge(), PetscOpenMPRun(), PetscOpenMPNew() in petsc-dev. These allow a simple master-worker model of parallelism with PETSc with a bunch of masters that can work together (instead of just one master) and each master controls a bunch of workers. The code in src/ksp/pc/impls/ openmp uses this code. Note that OpenMP has NOTHING to do with OpenMP the standard. Also I don't really have any support for Fortran, I hope you use C/C++. Comments welcome. It sounds like this matches what you need. It's pretty cool, but underdeveloped. Barry On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote: > > Hi, > > I have a question about the Petsc communicator. I have a petsc program > "foo" which essentially runs in parallel and gives me > y=f(x1,x2,...), where > y is an output parameter and xi's are input parameters. Suppose, I > want to > run a parallel optimizer for the input parameters. I am looking at the > following functionality. I submit the optimizer job on 16 processors > (using > "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of > "foo", each running parallely on 4 processors. "foo" will be written > as a > function and not as a main program in this case. How can I get this > functionality using Petsc ? Should PetscInitialize be called in the > optimizer, or in each foo run ? If PetscInitialize is called in the > optimizer, is there a way to make the foo function run only on a > subset of > the 16 processors ? > > May be, I haven't done a good job of explaining my problem. Let me > know if > you need any clarifications. > > Thanks > > Rgds, > Amit > From knepley at gmail.com Tue May 27 10:23:44 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 May 2008 10:23:44 -0500 Subject: Code structuring - Communicator In-Reply-To: References: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov> Message-ID: On Tue, May 27, 2008 at 10:18 AM, wrote: > Barry, > > I got a part of what I was trying to do (sub-communicator etc.), working. > Now suppose I want to repeat a calculation with a different input, I have > two ways of doing it (based on what I have coded). > > 1) > > MPI_Initialize > Create a group using MPI_Comm_group > Create several sub-groups and sub-communicators using MPI_Group_Incl and > MPI_Comm_create > Assign the sub-communicator to PETSC_COMM_WORLD > // Calculation 1 > { > Do PetscInitialize > Perform the calculation > Do PetscFinalize > } > // Calculation 2 > { > Do PetscInitialize > Perform the calculation > Do PetscFinalize > } > Do MPI_finalize > > 2) > > MPI_Initialize > Create a group using MPI_Comm_group > Create several sub-groups and sub-communicators using MPI_Group_Incl and > MPI_Comm_create > Assign the sub-communicator to PETSC_COMM_WORLD > Do PetscInitialize > // Calculation 1 > { > Perform the calculation > } > // Calculation 2 > { > Perform the calculation > } > Do PetscFinalize > Do MPI_finalize > > > The first method crashes. I am trying to understand why. The documentation What do you mean "crashes" and what line does it happen on? You can use -start_in_debugger to get a stack trace. I do not completely understand your pseudocode, however, you should never call PetscInitialize()/Finalize() more than once. Matt > says that PetscFinalize calls MPI_finalize only if MPI_Initialize is not > called before PetscInitialize. In my case, is PetscFinalize destroying the > sub-communicators ? > > Thanks > > Rgds, > Amit > > > > > > Barry Smith > ov> To > Sent by: petsc-users at mcs.anl.gov > owner-petsc-users cc > @mcs.anl.gov > No Phone Info Subject > Available Re: Code structuring - Communicator > > > 05/09/2008 03:07 > PM > > > Please respond to > petsc-users at mcs.a > nl.gov > > > > > > > > There are many ways to do this, most of them involve using MPI to > construct subcommunicators > for the various sub parallel tasks. You very likely want to keep > PetscInitialize() at > the very beginning of the program; you would not write the calls in > terms of > PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the > subcommunicators to create the objects. > > An alternative approach is to look at the manual page for > PetscOpenMPMerge(), PetscOpenMPRun(), > PetscOpenMPNew() in petsc-dev. These allow a simple master-worker > model of parallelism > with PETSc with a bunch of masters that can work together (instead of > just one master) and each > master controls a bunch of workers. The code in src/ksp/pc/impls/ > openmp uses this code. > > Note that OpenMP has NOTHING to do with OpenMP the standard. Also I > don't really have > any support for Fortran, I hope you use C/C++. Comments welcome. It > sounds like this matches > what you need. It's pretty cool, but underdeveloped. > > Barry > > > > On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote: > >> >> Hi, >> >> I have a question about the Petsc communicator. I have a petsc program >> "foo" which essentially runs in parallel and gives me >> y=f(x1,x2,...), where >> y is an output parameter and xi's are input parameters. Suppose, I >> want to >> run a parallel optimizer for the input parameters. I am looking at the >> following functionality. I submit the optimizer job on 16 processors >> (using >> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of >> "foo", each running parallely on 4 processors. "foo" will be written >> as a >> function and not as a main program in this case. How can I get this >> functionality using Petsc ? Should PetscInitialize be called in the >> optimizer, or in each foo run ? If PetscInitialize is called in the >> optimizer, is there a way to make the foo function run only on a >> subset of >> the 16 processors ? >> >> May be, I haven't done a good job of explaining my problem. Let me >> know if >> you need any clarifications. >> >> Thanks >> >> Rgds, >> Amit >> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From zonexo at gmail.com Tue May 27 10:23:56 2008 From: zonexo at gmail.com (Ben Tay) Date: Tue, 27 May 2008 23:23:56 +0800 Subject: Comparing matrices between 2 different codes and viewing of matrix In-Reply-To: References: <483942C7.7080705@gmail.com> <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> <483BE904.4090608@gmail.com> Message-ID: <483C278C.4030604@gmail.com> Hi Barry, Got it working. Thanks alot! Barry Smith wrote: > > You use EITHER PetscViewerCreate() and PetscViewerSetType() and > PetscViewerFileSetName() OR PetscViewerASCIIOpen() > so use > > >> call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr) >>> call MatView(A_mat_uv,viewer,ierr) >> call PetscViewerDestroy(viewer,ierr) > > > > On May 27, 2008, at 5:57 AM, Ben Tay wrote: > >> Hi Barry, >> >> Thanks for pointing out. However, I only got a 0 byte file after >> adding the include statement. >> >> I am programming in parallel and my matrix is created using >> MatCreateMPIAIJ. Did I missed out some commands? >> >> PetscViewer viewer >> >> >> ... >> >> call PetscViewerCreate(MPI_COMM_WORLD,viewer,ierr) >> >> call MatView(A_mat_uv,viewer,ierr) >> >> call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr) >> >> call PetscViewerDestroy(viewer,ierr) >> >> Thank you very much. >> >> Regards. >> >> >> Barry Smith wrote: >>> >>> Likely you forgot to #include "finclude/petscviewer.h" in the >>> Fortran subroutine that >>> is doing this stuff. >>> >>> >>> On May 25, 2008, at 5:43 AM, Ben Tay wrote: >>> >>>> Hi, >>>> >>>> I have an old serial code and a newer parallel code. The new >>>> parallel code is converted from the old serial code. However, due >>>> to numerous changes, the answers from the new code now differs from >>>> the old one after the 1st step. What is the best way to compare the >>>> matrices from the 2 different code? >>>> >>>> I guess the most direct mtd is to use MatView to store the matrix >>>> in a ACSII file and spot the difference between the 2 files. >>>> However, I can't seem to get it right. What I did is: >>>> >>>> PetscViewer viewer >>>> >>>> call PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr) >>>> >>>> call MatView(A_mat_uv,viewer,ierr) >>>> >>>> call PetscViewerDestroy(viewer,ierr) >>>> >>>> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr) >>>> >>>> However, I get the error that "PetscViewer viewer" has syntax >>>> error. Hope you can help me out. >>>> >>>> Thank you very much. >>>> >>>> Regards >>>> >>>> >>> >>> >> >> > > From bsmith at mcs.anl.gov Tue May 27 10:24:04 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 27 May 2008 10:24:04 -0500 Subject: Code structuring - Communicator In-Reply-To: References: Message-ID: You cannot call PetscInitialize() twice. Barry On May 27, 2008, at 10:18 AM, Amit.Itagi at seagate.com wrote: > Barry, > > I got a part of what I was trying to do (sub-communicator etc.), > working. > Now suppose I want to repeat a calculation with a different input, I > have > two ways of doing it (based on what I have coded). > > 1) > > MPI_Initialize > Create a group using MPI_Comm_group > Create several sub-groups and sub-communicators using MPI_Group_Incl > and > MPI_Comm_create > Assign the sub-communicator to PETSC_COMM_WORLD > // Calculation 1 > { > Do PetscInitialize > Perform the calculation > Do PetscFinalize > } > // Calculation 2 > { > Do PetscInitialize > Perform the calculation > Do PetscFinalize > } > Do MPI_finalize > > 2) > > MPI_Initialize > Create a group using MPI_Comm_group > Create several sub-groups and sub-communicators using MPI_Group_Incl > and > MPI_Comm_create > Assign the sub-communicator to PETSC_COMM_WORLD > Do PetscInitialize > // Calculation 1 > { > Perform the calculation > } > // Calculation 2 > { > Perform the calculation > } > Do PetscFinalize > Do MPI_finalize > > > The first method crashes. I am trying to understand why. The > documentation > says that PetscFinalize calls MPI_finalize only if MPI_Initialize is > not > called before PetscInitialize. In my case, is PetscFinalize > destroying the > sub-communicators ? > > Thanks > > Rgds, > Amit > > > > > > Barry Smith > > ov> To > Sent by: petsc-users at mcs.anl.gov > owner-petsc- > users cc > @mcs.anl.gov > No Phone Info > Subject > Available Re: Code structuring - > Communicator > > > 05/09/2008 03:07 > PM > > > Please respond to > petsc-users at mcs.a > nl.gov > > > > > > > > There are many ways to do this, most of them involve using MPI to > construct subcommunicators > for the various sub parallel tasks. You very likely want to keep > PetscInitialize() at > the very beginning of the program; you would not write the calls in > terms of > PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the > subcommunicators to create the objects. > > An alternative approach is to look at the manual page for > PetscOpenMPMerge(), PetscOpenMPRun(), > PetscOpenMPNew() in petsc-dev. These allow a simple master-worker > model of parallelism > with PETSc with a bunch of masters that can work together (instead of > just one master) and each > master controls a bunch of workers. The code in src/ksp/pc/impls/ > openmp uses this code. > > Note that OpenMP has NOTHING to do with OpenMP the standard. Also I > don't really have > any support for Fortran, I hope you use C/C++. Comments welcome. It > sounds like this matches > what you need. It's pretty cool, but underdeveloped. > > Barry > > > > On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote: > >> >> Hi, >> >> I have a question about the Petsc communicator. I have a petsc >> program >> "foo" which essentially runs in parallel and gives me >> y=f(x1,x2,...), where >> y is an output parameter and xi's are input parameters. Suppose, I >> want to >> run a parallel optimizer for the input parameters. I am looking at >> the >> following functionality. I submit the optimizer job on 16 processors >> (using >> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs >> of >> "foo", each running parallely on 4 processors. "foo" will be written >> as a >> function and not as a main program in this case. How can I get this >> functionality using Petsc ? Should PetscInitialize be called in the >> optimizer, or in each foo run ? If PetscInitialize is called in the >> optimizer, is there a way to make the foo function run only on a >> subset of >> the 16 processors ? >> >> May be, I haven't done a good job of explaining my problem. Let me >> know if >> you need any clarifications. >> >> Thanks >> >> Rgds, >> Amit >> > > > > From zonexo at gmail.com Tue May 27 10:58:55 2008 From: zonexo at gmail.com (Ben Tay) Date: Tue, 27 May 2008 23:58:55 +0800 Subject: Using MAT_NO_NEW_NONZERO_LOCATIONS and MAT_SYMMETRIC give error Message-ID: <483C2FBF.90509@gmail.com> Hi, I read in the manual that I can use either call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) or call MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr). When I use MAT_NO_NEW_NONZERO_LOCATIONS for the matrix formed by my momentum eqn, I get the error on my home computer udring compiling: :\Myprojects2\imb_airfoil_x2_parallel\mom_disz.f90(356) : Error: This name does not have a type, and must have an explicit type. [MAT_NO_NEW_NONZERO_LOCATIONS] call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) Although this error does not happen when I compile in linux, I can this error during run: [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Signal received! [0]PETSC ERROR: ------------------------------------------------------------------------ I use KSPRICHARDSON and PCILU for my momentum eqn matrix. When I use MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr) for my poisson eqn which is symmetric, I get the same error as above during run. Btw, I'm using hypre as the preconditioner and default solver. May I know why using these options give the error? Thank you very much. Regards. From knepley at gmail.com Tue May 27 11:13:12 2008 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 May 2008 11:13:12 -0500 Subject: Using MAT_NO_NEW_NONZERO_LOCATIONS and MAT_SYMMETRIC give error In-Reply-To: <483C2FBF.90509@gmail.com> References: <483C2FBF.90509@gmail.com> Message-ID: On Tue, May 27, 2008 at 10:58 AM, Ben Tay wrote: > Hi, > > I read in the manual that I can use either call > MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) or call > MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr). > > When I use MAT_NO_NEW_NONZERO_LOCATIONS for the matrix formed by my momentum > eqn, I get the error on my home computer udring compiling: > > :\Myprojects2\imb_airfoil_x2_parallel\mom_disz.f90(356) : Error: This name > does not have a type, and must have an explicit type. > [MAT_NO_NEW_NONZERO_LOCATIONS] > call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) 1) It appears this symbol was not defined for Fortran. This is fixed in petsc-dev. > Although this error does not happen when I compile in linux, I can this > error during run: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC > ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to > find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Signal received! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > > I use KSPRICHARDSON and PCILU for my momentum eqn matrix. > > When I use MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr) for my poisson > eqn which is symmetric, I get the same error as above during run. Btw, I'm > using hypre as the preconditioner and default solver. This option does not actually do anything. The SEGV must come from another error. Matt > May I know why using these options give the error? > > Thank you very much. > > Regards. > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From Amit.Itagi at seagate.com Tue May 27 12:10:25 2008 From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com) Date: Tue, 27 May 2008 13:10:25 -0400 Subject: Code structuring - Communicator In-Reply-To: Message-ID: owner-petsc-users at mcs.anl.gov wrote on 05/27/2008 11:23:44 AM: > On Tue, May 27, 2008 at 10:18 AM, wrote: > > Barry, > > > > I got a part of what I was trying to do (sub-communicator etc.), working. > > Now suppose I want to repeat a calculation with a different input, I have > > two ways of doing it (based on what I have coded). > > > > 1) > > > > MPI_Initialize > > Create a group using MPI_Comm_group > > Create several sub-groups and sub-communicators using MPI_Group_Incl and > > MPI_Comm_create > > Assign the sub-communicator to PETSC_COMM_WORLD > > // Calculation 1 > > { > > Do PetscInitialize > > Perform the calculation > > Do PetscFinalize > > } > > // Calculation 2 > > { > > Do PetscInitialize > > Perform the calculation > > Do PetscFinalize > > } > > Do MPI_finalize > > > > 2) > > > > MPI_Initialize > > Create a group using MPI_Comm_group > > Create several sub-groups and sub-communicators using MPI_Group_Incl and > > MPI_Comm_create > > Assign the sub-communicator to PETSC_COMM_WORLD > > Do PetscInitialize > > // Calculation 1 > > { > > Perform the calculation > > } > > // Calculation 2 > > { > > Perform the calculation > > } > > Do PetscFinalize > > Do MPI_finalize > > > > > > The first method crashes. I am trying to understand why. The documentation > > What do you mean "crashes" and what line does it happen on? You can use > -start_in_debugger to get a stack trace. I do not completely understand your > pseudocode, however, you should never call PetscInitialize()/Finalize() more > than once. As Barry pointed out, the multiple calls to PetscInitialize is the likely reason for my problem. Thanks, Matt and Barry. > > Matt > > > says that PetscFinalize calls MPI_finalize only if MPI_Initialize is not > > called before PetscInitialize. In my case, is PetscFinalize destroying the > > sub-communicators ? > > > > Thanks > > > > Rgds, > > Amit > > > > > > > > > > > > Barry Smith > > > ov> To > > Sent by: petsc-users at mcs.anl.gov > > owner-petsc-users cc > > @mcs.anl.gov > > No Phone Info Subject > > Available Re: Code structuring - Communicator > > > > > > 05/09/2008 03:07 > > PM > > > > > > Please respond to > > petsc-users at mcs.a > > nl.gov > > > > > > > > > > > > > > > > There are many ways to do this, most of them involve using MPI to > > construct subcommunicators > > for the various sub parallel tasks. You very likely want to keep > > PetscInitialize() at > > the very beginning of the program; you would not write the calls in > > terms of > > PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the > > subcommunicators to create the objects. > > > > An alternative approach is to look at the manual page for > > PetscOpenMPMerge(), PetscOpenMPRun(), > > PetscOpenMPNew() in petsc-dev. These allow a simple master-worker > > model of parallelism > > with PETSc with a bunch of masters that can work together (instead of > > just one master) and each > > master controls a bunch of workers. The code in src/ksp/pc/impls/ > > openmp uses this code. > > > > Note that OpenMP has NOTHING to do with OpenMP the standard. Also I > > don't really have > > any support for Fortran, I hope you use C/C++. Comments welcome. It > > sounds like this matches > > what you need. It's pretty cool, but underdeveloped. > > > > Barry > > > > > > > > On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote: > > > >> > >> Hi, > >> > >> I have a question about the Petsc communicator. I have a petsc program > >> "foo" which essentially runs in parallel and gives me > >> y=f(x1,x2,...), where > >> y is an output parameter and xi's are input parameters. Suppose, I > >> want to > >> run a parallel optimizer for the input parameters. I am looking at the > >> following functionality. I submit the optimizer job on 16 processors > >> (using > >> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of > >> "foo", each running parallely on 4 processors. "foo" will be written > >> as a > >> function and not as a main program in this case. How can I get this > >> functionality using Petsc ? Should PetscInitialize be called in the > >> optimizer, or in each foo run ? If PetscInitialize is called in the > >> optimizer, is there a way to make the foo function run only on a > >> subset of > >> the 16 processors ? > >> > >> May be, I haven't done a good job of explaining my problem. Let me > >> know if > >> you need any clarifications. > >> > >> Thanks > >> > >> Rgds, > >> Amit > >> > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > From bsmith at mcs.anl.gov Tue May 27 12:31:44 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 27 May 2008 12:31:44 -0500 Subject: Using MAT_NO_NEW_NONZERO_LOCATIONS and MAT_SYMMETRIC give error In-Reply-To: References: <483C2FBF.90509@gmail.com> Message-ID: <44E50826-91FD-40CF-BE02-EC9E0CCDDAE9@mcs.anl.gov> The calling sequence for this routine changed recently, make sure you are using the same PETSc version on your different systems. Also please send bug reports like this to petsc-maint at mcs.anl.gov not petsc-users at mcs.anl.gov Barry On May 27, 2008, at 11:13 AM, Matthew Knepley wrote: > On Tue, May 27, 2008 at 10:58 AM, Ben Tay wrote: >> Hi, >> >> I read in the manual that I can use either call >> MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) >> or call >> MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr). >> >> When I use MAT_NO_NEW_NONZERO_LOCATIONS for the matrix formed by my >> momentum >> eqn, I get the error on my home computer udring compiling: >> >> :\Myprojects2\imb_airfoil_x2_parallel\mom_disz.f90(356) : Error: >> This name >> does not have a type, and must have an explicit type. >> [MAT_NO_NEW_NONZERO_LOCATIONS] >> call >> MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) > > 1) It appears this symbol was not defined for Fortran. This is fixed > in petsc-dev. > >> Although this error does not happen when I compile in linux, I can >> this >> error during run: >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or - >> on_error_attach_debugger >> [0]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal >> [0]PETSC >> ERROR: or try http://valgrind.org on linux or man libgmalloc on >> Apple to >> find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, >> link, and >> run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Signal received! >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> >> I use KSPRICHARDSON and PCILU for my momentum eqn matrix. >> >> When I use MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr) for my >> poisson >> eqn which is symmetric, I get the same error as above during run. >> Btw, I'm >> using hypre as the preconditioner and default solver. > > This option does not actually do anything. The SEGV must come from > another error. > > Matt > >> May I know why using these options give the error? >> >> Thank you very much. >> >> Regards. >> >> >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > From balay at mcs.anl.gov Wed May 28 08:12:39 2008 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 28 May 2008 08:12:39 -0500 (CDT) Subject: BOUNCE petsc-users@mcs.anl.gov: Non-member submission from [Marco Schauer ] (fwd) Message-ID: Approved:bsbglmdk Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4]) by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m4SAUFM21948 for ; Wed, 28 May 2008 05:30:16 -0500 Received: from localhost (localhost [127.0.0.1]) by mailgw.mcs.anl.gov (Postfix) with ESMTP id D6BA0348003 for ; Wed, 28 May 2008 05:30:15 -0500 (CDT) X-Greylist: delayed 23 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Wed, 28 May 2008 05:30:14 CDT Received: from rzcomm22.rz.tu-bs.de (rzcomm22.rz.tu-bs.de [134.169.9.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mailgw.mcs.anl.gov (Postfix) with ESMTP id 5BAD2348002 for ; Wed, 28 May 2008 05:30:14 -0500 (CDT) Received: from [134.169.59.41] (seraph.infam.bau.tu-bs.de [134.169.59.41]) by rzcomm22.rz.tu-bs.de (8.13.8/8.13.8) with ESMTP id m4SATmG0019828 for ; Wed, 28 May 2008 12:29:49 +0200 (envelope-from m.schauer at tu-bs.de) Message-ID: <483D341B.6060100 at tu-bs.de> Date: Wed, 28 May 2008 12:29:47 +0200 From: Marco Schauer User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: petsc-users at mcs.anl.gov Subject: How to use SNES Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov X-Spam-Status: No, hits=0.0 required=5.0 tests=USER_AGENT version=2.55 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) X-MCS-Mail-Loop: petsc-users Hello, I???d like to compute a nonlinear equation system. My system looks like this: K(u)*u=f, in which u, f are vectors and K is a Matrix that depends on u. I have already a function to calculate K, but I don???t now how can I use PETSc to solve this system? Thanks for support, kind regard Marco Schauer From balay at mcs.anl.gov Wed May 28 08:19:10 2008 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 28 May 2008 08:19:10 -0500 (CDT) Subject: BOUNCE petsc-users@mcs.anl.gov: Non-member submission from [Marco Schauer ] (fwd) Message-ID: Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4]) by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m4SAUFM21948 for ; Wed, 28 May 2008 05:30:16 -0500 Received: from localhost (localhost [127.0.0.1]) by mailgw.mcs.anl.gov (Postfix) with ESMTP id D6BA0348003 for ; Wed, 28 May 2008 05:30:15 -0500 (CDT) X-Greylist: delayed 23 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Wed, 28 May 2008 05:30:14 CDT Received: from rzcomm22.rz.tu-bs.de (rzcomm22.rz.tu-bs.de [134.169.9.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mailgw.mcs.anl.gov (Postfix) with ESMTP id 5BAD2348002 for ; Wed, 28 May 2008 05:30:14 -0500 (CDT) Received: from [134.169.59.41] (seraph.infam.bau.tu-bs.de [134.169.59.41]) by rzcomm22.rz.tu-bs.de (8.13.8/8.13.8) with ESMTP id m4SATmG0019828 for ; Wed, 28 May 2008 12:29:49 +0200 (envelope-from m.schauer at tu-bs.de) Message-ID: <483D341B.6060100 at tu-bs.de> Date: Wed, 28 May 2008 12:29:47 +0200 From: Marco Schauer User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: petsc-users at mcs.anl.gov Subject: How to use SNES Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov X-Spam-Status: No, hits=0.0 required=5.0 tests=USER_AGENT version=2.55 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) X-MCS-Mail-Loop: petsc-users Hello, I???d like to compute a nonlinear equation system. My system looks like this: K(u)*u=f, in which u, f are vectors and K is a Matrix that depends on u. I have already a function to calculate K, but I don???t now how can I use PETSc to solve this system? Thanks for support, kind regard Marco Schauer From knepley at gmail.com Wed May 28 08:20:07 2008 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 28 May 2008 08:20:07 -0500 Subject: BOUNCE petsc-users@mcs.anl.gov: Non-member submission from [Marco Schauer ] (fwd) In-Reply-To: References: Message-ID: On Wed, May 28, 2008 at 8:12 AM, Satish Balay wrote: > > Approved:bsbglmdk > > Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4]) > by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m4SAUFM21948 > for ; Wed, 28 May 2008 05:30:16 -0500 > Received: from localhost (localhost [127.0.0.1]) > by mailgw.mcs.anl.gov (Postfix) with ESMTP id D6BA0348003 > for ; Wed, 28 May 2008 05:30:15 -0500 (CDT) > X-Greylist: delayed 23 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Wed, 28 May 2008 05:30:14 CDT > Received: from rzcomm22.rz.tu-bs.de (rzcomm22.rz.tu-bs.de [134.169.9.68]) > (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) > (No client certificate requested) > by mailgw.mcs.anl.gov (Postfix) with ESMTP id 5BAD2348002 > for ; Wed, 28 May 2008 05:30:14 -0500 (CDT) > Received: from [134.169.59.41] (seraph.infam.bau.tu-bs.de [134.169.59.41]) > by rzcomm22.rz.tu-bs.de (8.13.8/8.13.8) with ESMTP id m4SATmG0019828 > for ; Wed, 28 May 2008 12:29:49 +0200 > (envelope-from m.schauer at tu-bs.de) > Message-ID: <483D341B.6060100 at tu-bs.de> > Date: Wed, 28 May 2008 12:29:47 +0200 > From: Marco Schauer > User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) > MIME-Version: 1.0 > To: petsc-users at mcs.anl.gov > Subject: How to use SNES > Content-Type: text/plain; charset=UTF-8; format=flowed > Content-Transfer-Encoding: 8bit > X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov > X-Spam-Status: No, hits=0.0 required=5.0 > tests=USER_AGENT > version=2.55 > X-Spam-Level: > X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) > X-MCS-Mail-Loop: petsc-users > > Hello, > I???d like to compute a nonlinear equation system. My system looks like > this: K(u)*u=f, in which u, f are vectors and K is a Matrix that depends > on u. I have already a function to calculate K, but I don???t now how can > I use PETSc to solve this system? The first thing to do is formulate the system as F(u) = 0, so that would be F(u) = K(u)*u - f This is directly soluble with the option -snes_fd. If later you want to provide the Jacobian, you can. Matt > Thanks for support, kind regard > Marco Schauer -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From Lars.Rindorf at teknologisk.dk Wed May 28 09:07:05 2008 From: Lars.Rindorf at teknologisk.dk (Lars Rindorf) Date: Wed, 28 May 2008 16:07:05 +0200 Subject: Slow MatSetValues Message-ID: Hi everybody I have a problem with MatSetValues, since the building of my matrix takes much longer (35 s) than its solution (0.2 s). When the number of degrees of freedom is increased, then the problem worsens. The rate of which the elements of the (sparse) matrix is set also seems to decrease with the number of elements already set. That is, it becomes slower near the end. The structure of my program is something like: for element in finite elements for dof in element for equations in FEM formulation ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); where i,j,k,l are appropriate integers and tmp is a double value to be added. The code has fine worked with previous version of petsc (not compiled by me). The version of petsc that I use is slightly newer (I think), 2.3.3 vs ~2.3. Is it something of an dynamic allocation problem? I have tried using MatSetValuesBlock, but this is only slightly faster. If I monitor the program's CPU and memory consumption then the CPU is 100 % used and the memory consumption is only 20-30 mb. My computer is a red hat linux with a xeon quad core processor. I use Intel's MKL blas and lapack. What should I do to speed up the petsc? Kind regards Lars _____________________________ Lars Rindorf M.Sc., Ph.D. http://www.dti.dk Danish Technological Institute Gregersensvej 2630 Taastrup Denmark Phone +45 72 20 20 00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Wed May 28 09:18:43 2008 From: zonexo at gmail.com (Ben Tay) Date: Wed, 28 May 2008 22:18:43 +0800 Subject: Comparing matrices between 2 different codes and viewing of matrix In-Reply-To: References: <483942C7.7080705@gmail.com> Message-ID: <483D69C3.7030002@gmail.com> Hi Lisandro, Thank you for your help. But I believe the matrix of the serial and parallel, if done correctly are the same during viewing. However, the vectors will be split into different processors. Regards. Lisandro Dalcin wrote: > On 5/25/08, Ben Tay wrote: > >> What is the best way to compare the matrices from the 2 different code? >> >> I guess the most direct mtd is to use MatView to store the matrix in a >> ACSII file and spot the difference between the 2 files. However, I can't >> seem to get it right. What I did is: >> > > But then take into account that in the parallel case, the numbering of > the rows/cols of the matrix will perhaps not match the one of the > sequential case. If you save the parallel matrix, you will also need > to save an appropriate permutation to be able to actually compare the > matrices. Or perhaps better, a combination of an application ordering, > an index set, and MatPermute(). > > > From gdiso at ustc.edu Wed May 28 09:48:14 2008 From: gdiso at ustc.edu (Gong Ding) Date: Wed, 28 May 2008 22:48:14 +0800 Subject: MatZeroRows and MatAssembly Message-ID: Hi, I meet some trouble about my code. I'd like to use MatZeroRows to perform Dirichlet boundary condition. However, it requires MatAssembly, which packs the sparse matrix. Then the none zero pattern seems to be freezed. The continued MatSetValues may be very low efficient if the item not in the previous none zero pattern.. Is there any way to use MatZeroRows without freeze the none zero pattern? Or I have to redesign my code.... Regards Gong Ding From bsmith at mcs.anl.gov Wed May 28 10:03:57 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 28 May 2008 10:03:57 -0500 Subject: Slow MatSetValues In-Reply-To: References: Message-ID: http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html Also, slightl less important, collapse the 4 MatSetValues() below into a single call that does the little two by two block Barry On May 28, 2008, at 9:07 AM, Lars Rindorf wrote: > Hi everybody > > I have a problem with MatSetValues, since the building of my matrix > takes much longer (35 s) than its solution (0.2 s). When the number > of degrees of freedom is increased, then the problem worsens. The > rate of which the elements of the (sparse) matrix is set also seems > to decrease with the number of elements already set. That is, it > becomes slower near the end. > > The structure of my program is something like: > > for element in finite elements > for dof in element > for equations in FEM formulation > ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); > ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); > ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); > ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); > > > where i,j,k,l are appropriate integers and tmp is a double value to > be added. > > The code has fine worked with previous version of petsc (not > compiled by me). The version of petsc that I use is slightly newer > (I think), 2.3.3 vs ~2.3. > > Is it something of an dynamic allocation problem? I have tried using > MatSetValuesBlock, but this is only slightly faster. If I monitor > the program's CPU and memory consumption then the CPU is 100 % used > and the memory consumption is only 20-30 mb. > > My computer is a red hat linux with a xeon quad core processor. I > use Intel's MKL blas and lapack. > > What should I do to speed up the petsc? > > Kind regards > Lars > _____________________________ > > > Lars Rindorf > M.Sc., Ph.D. > > http://www.dti.dk > > Danish Technological Institute > Gregersensvej > > 2630 Taastrup > > Denmark > Phone +45 72 20 20 00 > > From bsmith at mcs.anl.gov Wed May 28 10:06:04 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 28 May 2008 10:06:04 -0500 Subject: MatZeroRows and MatAssembly In-Reply-To: References: Message-ID: On May 28, 2008, at 9:48 AM, Gong Ding wrote: > Hi, > I meet some trouble about my code. > I'd like to use MatZeroRows to perform Dirichlet boundary condition. > However, it requires MatAssembly, which packs the sparse matrix. > Then the none zero pattern seems to be freezed. Not froozen, but yes adding more nonzeros later will be inefficient. > The continued MatSetValues > may be very low efficient if the item not in the previous none zero > pattern.. Yes > > > Is there any way to use MatZeroRows without freeze the none zero > pattern? No. > > Or I have to redesign my code.... Generally you can apply the Dirichlet boundary conditions after you have completely constructed the matrix. Barry > > > Regards > Gong Ding > > From billy at dem.uminho.pt Thu May 29 15:50:23 2008 From: billy at dem.uminho.pt (=?iso-8859-1?Q?Billy_Ara=FAjo?=) Date: Thu, 29 May 2008 21:50:23 +0100 Subject: Slow MatSetValues References: Message-ID: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> Hi, I just want to share my experience with FE assembly. I think the problem of preallocation in finite element matrices is that you don't know how many elements are connected to a given node, there can be 5, 20 elements or more. You can build a structure with the number of nodes connected to a node and then preallocate the matrix but this is not very efficient. I know UMFPACK has a method of forming triplets with the matrix information and then it has routines to add duplicate entries and compress the data in a compressed matrix format. Although I have never used UMFPACK with PETSC. I also don't know if there are similiar functions in PETSC optimized for FE matrix assembly. Regards, Billy. -----Mensagem original----- De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith Enviada: qua 28-05-2008 16:03 Para: petsc-users at mcs.anl.gov Assunto: Re: Slow MatSetValues http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html Also, slightl less important, collapse the 4 MatSetValues() below into a single call that does the little two by two block Barry On May 28, 2008, at 9:07 AM, Lars Rindorf wrote: > Hi everybody > > I have a problem with MatSetValues, since the building of my matrix > takes much longer (35 s) than its solution (0.2 s). When the number > of degrees of freedom is increased, then the problem worsens. The > rate of which the elements of the (sparse) matrix is set also seems > to decrease with the number of elements already set. That is, it > becomes slower near the end. > > The structure of my program is something like: > > for element in finite elements > for dof in element > for equations in FEM formulation > ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); > ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); > ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); > ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); > > > where i,j,k,l are appropriate integers and tmp is a double value to > be added. > > The code has fine worked with previous version of petsc (not > compiled by me). The version of petsc that I use is slightly newer > (I think), 2.3.3 vs ~2.3. > > Is it something of an dynamic allocation problem? I have tried using > MatSetValuesBlock, but this is only slightly faster. If I monitor > the program's CPU and memory consumption then the CPU is 100 % used > and the memory consumption is only 20-30 mb. > > My computer is a red hat linux with a xeon quad core processor. I > use Intel's MKL blas and lapack. > > What should I do to speed up the petsc? > > Kind regards > Lars > _____________________________ > > > Lars Rindorf > M.Sc., Ph.D. > > http://www.dti.dk > > Danish Technological Institute > Gregersensvej > > 2630 Taastrup > > Denmark > Phone +45 72 20 20 00 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 29 16:49:42 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 29 May 2008 16:49:42 -0500 Subject: Slow MatSetValues In-Reply-To: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> References: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> Message-ID: <3BEF7FCC-AF4A-461A-B068-9DB01EF94B7A@mcs.anl.gov> Partition the elements across the processes, then partition the nodes across processes (try to make sure that each node is on the same process of at least one of its elements), create 1) three parallel vectors with the number of local owned nodes on each process call these vectors off and on and owner; fill the on vector with a 1 in each location, fill the vector owner with rank for each element 2) three sequential vectors on each process with the total number of nodes of all the elements of that process (this is the locally owned plus ghosted nodes) call these vectors ghostedoff and ghostedon and ghostedowner 3) a VecScatter from the "locally owned plus ghosted nodes" to the "local owned nodes" [you need these anyways for the numerical part of the code when you evaluate your nonlinear functions (or right hand side for linear problems) scatter the owner vector to the ghostedowner vector now on each process loop over the locally owned ELEMENTS for each node1 in that element for each node2 in that element (excluding the node1 in the outer loop) if node1 and node2 share an edge (face in 3d) and that edge (face in 3d) is not a boundary edge (face in 3d) set t = .5 (this prevents double counting of these couplings) else set t = 1.0 if node1 and node2 are both owned by the same process** addt t into ghostedon at both the node1 location and the node2 location if node1 and node2 are owned by different processes add t into ghostedoff at both the node1 and node2 location Do a VecScatter add from the ghostedoff and ghostedon into the off and on. The off and on now contain exactly the preallocation need for each processes preallocation. The amount of work required is proportional to the number of elements times the (number of nodes on an element)^2, the amount of memory needed is roughly three global vectors and three local vectors. This is much less work and memory then needed in the numerical part of the code hence is very efficient. In fact it is likely much cheaper than a single nonlinear function evaluation. Barry ** two nodes are owned by the same process if ghostedowner of node1 matches ghostedowner of node2 On May 29, 2008, at 3:50 PM, Billy Ara?jo wrote: > > Hi, > > I just want to share my experience with FE assembly. > I think the problem of preallocation in finite element matrices is > that you don't know how many elements are connected to a given node, > there can be 5, 20 elements or more. You can build a structure with > the number of nodes connected to a node and then preallocate the > matrix but this is not very efficient. > > I know UMFPACK has a method of forming triplets with the matrix > information and then it has routines to add duplicate entries and > compress the data in a compressed matrix format. Although I have > never used UMFPACK with PETSC. I also don't know if there are > similiar functions in PETSC optimized for FE matrix assembly. > > Regards, > > Billy. > > > > -----Mensagem original----- > De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith > Enviada: qua 28-05-2008 16:03 > Para: petsc-users at mcs.anl.gov > Assunto: Re: Slow MatSetValues > > > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html > > Also, slightl less important, collapse the 4 MatSetValues() below > into a single call that does the little two by two block > > Barry > > On May 28, 2008, at 9:07 AM, Lars Rindorf wrote: > > > Hi everybody > > > > I have a problem with MatSetValues, since the building of my matrix > > takes much longer (35 s) than its solution (0.2 s). When the number > > of degrees of freedom is increased, then the problem worsens. The > > rate of which the elements of the (sparse) matrix is set also seems > > to decrease with the number of elements already set. That is, it > > becomes slower near the end. > > > > The structure of my program is something like: > > > > for element in finite elements > > for dof in element > > for equations in FEM formulation > > ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); > > ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); > > ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); > > ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); > > > > > > where i,j,k,l are appropriate integers and tmp is a double value to > > be added. > > > > The code has fine worked with previous version of petsc (not > > compiled by me). The version of petsc that I use is slightly newer > > (I think), 2.3.3 vs ~2.3. > > > > Is it something of an dynamic allocation problem? I have tried using > > MatSetValuesBlock, but this is only slightly faster. If I monitor > > the program's CPU and memory consumption then the CPU is 100 % used > > and the memory consumption is only 20-30 mb. > > > > My computer is a red hat linux with a xeon quad core processor. I > > use Intel's MKL blas and lapack. > > > > What should I do to speed up the petsc? > > > > Kind regards > > Lars > > _____________________________ > > > > > > Lars Rindorf > > M.Sc., Ph.D. > > > > http://www.dti.dk > > > > Danish Technological Institute > > Gregersensvej > > > > 2630 Taastrup > > > > Denmark > > Phone +45 72 20 20 00 > > > > > > From dalcinl at gmail.com Thu May 29 16:44:56 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 29 May 2008 18:44:56 -0300 Subject: Slow MatSetValues In-Reply-To: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> References: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> Message-ID: I will not buy at first that things can be done better than looping over elements filling a std::vector< std::set >, next filling a vector with each row size, next preallocating the AIJ matrix, and finally another loop filling matrix rows with zeros, ones, or garbage. All this are about 10-20 lines of code using simple C++ STL containers and a few calls to PETSc. If anyone has a better way, and can demostrate it with actual timing numbers and some self contained example code, the perhaps I can take the effort of adding something like this to PETSc. But I doubt that this effort is going to pay something ;-). On 5/29/08, Billy Ara?jo wrote: > Hi, > > I just want to share my experience with FE assembly. > I think the problem of preallocation in finite element matrices is that you > don't know how many elements are connected to a given node, there can be 5, > 20 elements or more. You can build a structure with the number of nodes > connected to a node and then preallocate the matrix but this is not very > efficient. > > I know UMFPACK has a method of forming triplets with the matrix information > and then it has routines to add duplicate entries and compress the data in a > compressed matrix format. Although I have never used UMFPACK with PETSC. I > also don't know if there are similiar functions in PETSC optimized for FE > matrix assembly. > > Regards, > > Billy. > > > > -----Mensagem original----- > De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith > Enviada: qua 28-05-2008 16:03 > Para: petsc-users at mcs.anl.gov > Assunto: Re: Slow MatSetValues > > > > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html > > Also, slightl less important, collapse the 4 MatSetValues() below > into a single call that does the little two by two block > > Barry > > On May 28, 2008, at 9:07 AM, Lars Rindorf wrote: > > > Hi everybody > > > > I have a problem with MatSetValues, since the building of my matrix > > takes much longer (35 s) than its solution (0.2 s). When the number > > of degrees of freedom is increased, then the problem worsens. The > > rate of which the elements of the (sparse) matrix is set also seems > > to decrease with the number of elements already set. That is, it > > becomes slower near the end. > > > > The structure of my program is something like: > > > > for element in finite elements > > for dof in element > > for equations in FEM formulation > > ierr = > MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); > > ierr = > MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); > > ierr = > MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); > > ierr = > MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); > > > > > > where i,j,k,l are appropriate integers and tmp is a double value to > > be added. > > > > The code has fine worked with previous version of petsc (not > > compiled by me). The version of petsc that I use is slightly newer > > (I think), 2.3.3 vs ~2.3. > > > > Is it something of an dynamic allocation problem? I have tried using > > MatSetValuesBlock, but this is only slightly faster. If I monitor > > the program's CPU and memory consumption then the CPU is 100 % used > > and the memory consumption is only 20-30 mb. > > > > My computer is a red hat linux with a xeon quad core processor. I > > use Intel's MKL blas and lapack. > > > > What should I do to speed up the petsc? > > > > Kind regards > > Lars > > _____________________________ > > > > > > Lars Rindorf > > M.Sc., Ph.D. > > > > http://www.dti.dk > > > > Danish Technological Institute > > Gregersensvej > > > > 2630 Taastrup > > > > Denmark > > Phone +45 72 20 20 00 > > > > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From bsmith at mcs.anl.gov Thu May 29 20:21:08 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 29 May 2008 20:21:08 -0500 Subject: Slow MatSetValues In-Reply-To: <3BEF7FCC-AF4A-461A-B068-9DB01EF94B7A@mcs.anl.gov> References: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> <3BEF7FCC-AF4A-461A-B068-9DB01EF94B7A@mcs.anl.gov> Message-ID: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov> I realize I made a mistake for three dimensions below; when nodes share an edge in 3d they will over counted. The fix is to have another array with one entry per edge that gives the number of elements that contain that edge. Then use if node1 and node2 share an edge then t = 1/ elementsperedge[edge that connects node1 and node2] > else if node1 and node2 share an face in 3d and that > face in 3d is not a boundary face set t = .5 (this prevents double > counting of these couplings) > else set t = 1.0 This increases the complexity of the code a bit but is still very rapid. Barry On May 29, 2008, at 4:49 PM, Barry Smith wrote: > > Partition the elements across the processes, > > then partition the nodes across processes (try to make sure that > each node is on the same process of at least one of its elements), > > create > 1) three parallel vectors with the number of local owned nodes > on each process > call these vectors off and on and owner; fill the on vector > with a 1 in each location, fill the vector owner with rank for each > element > 2) three sequential vectors on each process with the total > number of nodes of all the elements of that process (this is the > locally owned plus ghosted nodes) > call these vectors ghostedoff and ghostedon and ghostedowner > 3) a VecScatter from the "locally owned plus ghosted nodes" to > the "local owned nodes" > [you need these anyways for the numerical part of the code when > you evaluate your nonlinear functions (or right hand side for linear > problems) > > scatter the owner vector to the ghostedowner vector > now on each process loop over the locally owned ELEMENTS > for each node1 in that element > for each node2 in that element (excluding the node1 in the > outer loop) > if node1 and node2 share an edge (face in 3d) and > that edge (face in 3d) is not a boundary edge (face in 3d) set t = . > 5 (this prevents double counting of these couplings) > else set t = 1.0 > if node1 and node2 are both owned by the same > process** addt t into ghostedon at both the node1 location and the > node2 location > if node1 and node2 are owned by different processes > add t into ghostedoff at both the node1 and node2 location > > Do a VecScatter add from the ghostedoff and ghostedon into the off > and on. > > The off and on now contain exactly the preallocation need for each > processes preallocation. > > The amount of work required is proportional to the number of > elements times the (number of nodes on an element)^2, the amount of > memory > needed is roughly three global vectors and three local vectors. > This is much less work and memory then needed in the numerical part > of the > code hence is very efficient. In fact it is likely much cheaper > than a single nonlinear function evaluation. > > Barry > > ** two nodes are owned by the same process if ghostedowner of node1 > matches ghostedowner of node2 > > On May 29, 2008, at 3:50 PM, Billy Ara?jo wrote: > >> >> Hi, >> >> I just want to share my experience with FE assembly. >> I think the problem of preallocation in finite element matrices is >> that you don't know how many elements are connected to a given >> node, there can be 5, 20 elements or more. You can build a >> structure with the number of nodes connected to a node and then >> preallocate the matrix but this is not very efficient. >> >> I know UMFPACK has a method of forming triplets with the matrix >> information and then it has routines to add duplicate entries and >> compress the data in a compressed matrix format. Although I have >> never used UMFPACK with PETSC. I also don't know if there are >> similiar functions in PETSC optimized for FE matrix assembly. >> >> Regards, >> >> Billy. >> >> >> >> -----Mensagem original----- >> De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith >> Enviada: qua 28-05-2008 16:03 >> Para: petsc-users at mcs.anl.gov >> Assunto: Re: Slow MatSetValues >> >> >> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse >> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html >> >> Also, slightl less important, collapse the 4 MatSetValues() below >> into a single call that does the little two by two block >> >> Barry >> >> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote: >> >> > Hi everybody >> > >> > I have a problem with MatSetValues, since the building of my matrix >> > takes much longer (35 s) than its solution (0.2 s). When the number >> > of degrees of freedom is increased, then the problem worsens. The >> > rate of which the elements of the (sparse) matrix is set also seems >> > to decrease with the number of elements already set. That is, it >> > becomes slower near the end. >> > >> > The structure of my program is something like: >> > >> > for element in finite elements >> > for dof in element >> > for equations in FEM formulation >> > ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); >> > ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); >> > ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); >> > ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); >> > >> > >> > where i,j,k,l are appropriate integers and tmp is a double value to >> > be added. >> > >> > The code has fine worked with previous version of petsc (not >> > compiled by me). The version of petsc that I use is slightly newer >> > (I think), 2.3.3 vs ~2.3. >> > >> > Is it something of an dynamic allocation problem? I have tried >> using >> > MatSetValuesBlock, but this is only slightly faster. If I monitor >> > the program's CPU and memory consumption then the CPU is 100 % used >> > and the memory consumption is only 20-30 mb. >> > >> > My computer is a red hat linux with a xeon quad core processor. I >> > use Intel's MKL blas and lapack. >> > >> > What should I do to speed up the petsc? >> > >> > Kind regards >> > Lars >> > _____________________________ >> > >> > >> > Lars Rindorf >> > M.Sc., Ph.D. >> > >> > http://www.dti.dk >> > >> > Danish Technological Institute >> > Gregersensvej >> > >> > 2630 Taastrup >> > >> > Denmark >> > Phone +45 72 20 20 00 >> > >> > >> >> > From Lars.Rindorf at teknologisk.dk Fri May 30 06:44:12 2008 From: Lars.Rindorf at teknologisk.dk (Lars Rindorf) Date: Fri, 30 May 2008 13:44:12 +0200 Subject: SV: Slow MatSetValues In-Reply-To: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov> Message-ID: Hi everybody Thanks for all the suggestions and help. The problem is of a bit different nature. I use only direct solvers, so I give the options "-ksp_type preonly -pc_type lu" to make a standard LU factorization. This works fine without any problems. If I additionally set "-mat_type umfpack" to use umfpack then MatSetValues is very, very slow (about 50 times slower). If, as a test, I call MatAssemblyBegin and MatAssemblyEnd before MatSetValues, and only use the lu (no umfpack) then the performance is very similarly slow. My code is otherwise identical in PETsc setup to that at http://www-unix.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex8.c.html There is no need to invoke MatAssemblyBegin() with the argument MAT_FLUSH_ASSEMBLY since MatSetValues is only given the ADD_VALUES argument. So it is not that. Is there some conflict with the matrix format used by umfpack and something else? KR, Lars -----Oprindelig meddelelse----- Fra: owner-petsc-users at mcs.anl.gov [mailto:owner-petsc-users at mcs.anl.gov] P? vegne af Barry Smith Sendt: 30. maj 2008 03:21 Til: petsc-users at mcs.anl.gov Emne: Re: Slow MatSetValues I realize I made a mistake for three dimensions below; when nodes share an edge in 3d they will over counted. The fix is to have another array with one entry per edge that gives the number of elements that contain that edge. Then use if node1 and node2 share an edge then t = 1/ elementsperedge[edge that connects node1 and node2] > else if node1 and node2 share an face in 3d and that > face in 3d is not a boundary face set t = .5 (this prevents double > counting of these couplings) > else set t = 1.0 This increases the complexity of the code a bit but is still very rapid. Barry On May 29, 2008, at 4:49 PM, Barry Smith wrote: > > Partition the elements across the processes, > > then partition the nodes across processes (try to make sure that > each node is on the same process of at least one of its elements), > > create > 1) three parallel vectors with the number of local owned nodes on > each process > call these vectors off and on and owner; fill the on vector > with a 1 in each location, fill the vector owner with rank for each > element > 2) three sequential vectors on each process with the total number > of nodes of all the elements of that process (this is the locally > owned plus ghosted nodes) > call these vectors ghostedoff and ghostedon and ghostedowner > 3) a VecScatter from the "locally owned plus ghosted nodes" to > the "local owned nodes" > [you need these anyways for the numerical part of the code when > you evaluate your nonlinear functions (or right hand side for linear > problems) > > scatter the owner vector to the ghostedowner vector now on each > process loop over the locally owned ELEMENTS > for each node1 in that element > for each node2 in that element (excluding the node1 in the > outer loop) > if node1 and node2 share an edge (face in 3d) and that > edge (face in 3d) is not a boundary edge (face in 3d) set t = . > 5 (this prevents double counting of these couplings) > else set t = 1.0 > if node1 and node2 are both owned by the same > process** addt t into ghostedon at both the node1 location and the > node2 location > if node1 and node2 are owned by different processes > add t into ghostedoff at both the node1 and node2 location > > Do a VecScatter add from the ghostedoff and ghostedon into the off > and on. > > The off and on now contain exactly the preallocation need for each > processes preallocation. > > The amount of work required is proportional to the number of > elements times the (number of nodes on an element)^2, the amount of > memory > needed is roughly three global vectors and three local vectors. > This is much less work and memory then needed in the numerical part of > the > code hence is very efficient. In fact it is likely much cheaper than > a single nonlinear function evaluation. > > Barry > > ** two nodes are owned by the same process if ghostedowner of node1 > matches ghostedowner of node2 > > On May 29, 2008, at 3:50 PM, Billy Ara?jo wrote: > >> >> Hi, >> >> I just want to share my experience with FE assembly. >> I think the problem of preallocation in finite element matrices is >> that you don't know how many elements are connected to a given node, >> there can be 5, 20 elements or more. You can build a structure with >> the number of nodes connected to a node and then preallocate the >> matrix but this is not very efficient. >> >> I know UMFPACK has a method of forming triplets with the matrix >> information and then it has routines to add duplicate entries and >> compress the data in a compressed matrix format. Although I have >> never used UMFPACK with PETSC. I also don't know if there are >> similiar functions in PETSC optimized for FE matrix assembly. >> >> Regards, >> >> Billy. >> >> >> >> -----Mensagem original----- >> De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith >> Enviada: qua 28-05-2008 16:03 >> Para: petsc-users at mcs.anl.gov >> Assunto: Re: Slow MatSetValues >> >> >> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/do >> cs/manual.pdf#sec_matsparse >> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/do >> cs/manualpages/Mat/MatCreateMPIAIJ.html >> >> Also, slightl less important, collapse the 4 MatSetValues() below >> into a single call that does the little two by two block >> >> Barry >> >> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote: >> >> > Hi everybody >> > >> > I have a problem with MatSetValues, since the building of my matrix >> > takes much longer (35 s) than its solution (0.2 s). When the number >> > of degrees of freedom is increased, then the problem worsens. The >> > rate of which the elements of the (sparse) matrix is set also seems >> > to decrease with the number of elements already set. That is, it >> > becomes slower near the end. >> > >> > The structure of my program is something like: >> > >> > for element in finite elements >> > for dof in element >> > for equations in FEM formulation >> > ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); >> > ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); >> > ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); >> > ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values); >> > >> > >> > where i,j,k,l are appropriate integers and tmp is a double value to >> > be added. >> > >> > The code has fine worked with previous version of petsc (not >> > compiled by me). The version of petsc that I use is slightly newer >> > (I think), 2.3.3 vs ~2.3. >> > >> > Is it something of an dynamic allocation problem? I have tried >> using >> > MatSetValuesBlock, but this is only slightly faster. If I monitor >> > the program's CPU and memory consumption then the CPU is 100 % used >> > and the memory consumption is only 20-30 mb. >> > >> > My computer is a red hat linux with a xeon quad core processor. I >> > use Intel's MKL blas and lapack. >> > >> > What should I do to speed up the petsc? >> > >> > Kind regards >> > Lars >> > _____________________________ >> > >> > >> > Lars Rindorf >> > M.Sc., Ph.D. >> > >> > http://www.dti.dk >> > >> > Danish Technological Institute >> > Gregersensvej >> > >> > 2630 Taastrup >> > >> > Denmark >> > Phone +45 72 20 20 00 >> > >> > >> >> > From jed at 59A2.org Fri May 30 07:31:09 2008 From: jed at 59A2.org (Jed Brown) Date: Fri, 30 May 2008 14:31:09 +0200 Subject: SV: Slow MatSetValues In-Reply-To: References: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov> Message-ID: <20080530123109.GB3835@brakk.ethz.ch> On Fri 2008-05-30 13:44, Lars Rindorf wrote: > Hi everybody > > Thanks for all the suggestions and help. The problem is of a bit different nature. I use only direct solvers, so I give the options "-ksp_type preonly -pc_type lu" to make a standard LU factorization. This works fine without any problems. If I additionally set "-mat_type umfpack" to use umfpack then MatSetValues is very, very slow (about 50 times slower). If, as a test, I call MatAssemblyBegin and MatAssemblyEnd before MatSetValues, and only use the lu (no umfpack) then the performance is very similarly slow. I've seen this problem when preallocation information is lost by changing the matrix type. Try putting MatSeqAIJSetPreallocation() and/or (it doesn't hurt to do both) MatMPIAIJSetPreallocation() after MatSetFromOptions(). This will preallocate for any matrix type that inherits from these two types (which I think is anything you might use). Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From gdiso at ustc.edu Fri May 30 10:19:32 2008 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 30 May 2008 23:19:32 +0800 Subject: problem of parallel MatAssembled Message-ID: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel> Hi, I use MatAssembled to determine if a parallel matrix (MPIAIJ) is assembled. One processor says 1 and another says 0. (Petsc-2.3.3-p2) The correct answer should be the same for all the processor I think. Is it a bug or I forget something? BTW. I think a function as MatAddRowToRow is useful. I had implemented with MatGetRow and MatSetValues (or use Mat*Mat?). However, write it from lower level should be more efficient. If the develops can add this to petsc... Regards Gong Ding From dalcinl at gmail.com Fri May 30 10:37:00 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 30 May 2008 12:37:00 -0300 Subject: problem of parallel MatAssembled In-Reply-To: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel> References: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel> Message-ID: On 5/30/08, Gong Ding wrote: > Hi, > I use MatAssembled to determine if a parallel matrix (MPIAIJ) is assembled. > One processor says 1 and another says 0. (Petsc-2.3.3-p2) > The correct answer should be the same for all the processor I think. > Is it a bug or I forget something? Are you completelly sure that you collectively called at ALL processes MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY) MatAssemblyEndA, MAT_FINAL_ASSEMBLY) Please, note the MAT_FINAL_ASSEMBLY. If you used MAT_FLUSH_ASSEMBLY at some process, then what you get is expected. > BTW. I think a function as MatAddRowToRow is useful. > I had implemented with MatGetRow and MatSetValues (or use Mat*Mat?). Not sure what you are trying to do, please elaborate a bit more. Why a call like this PetscInt row = ... PetscInt ncols = ... PetscInt *cols_indices = ... PetscScalar *cols_values = ... MatSetValues(A, 1, &row, ncols, cols_indices, cols_values, ADD_VALUES) is not enough for you? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From gdiso at ustc.edu Fri May 30 11:36:03 2008 From: gdiso at ustc.edu (Gong Ding) Date: Sat, 31 May 2008 00:36:03 +0800 Subject: problem of parallel MatAssembled References: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel> Message-ID: ----- Original Message ----- From: "Lisandro Dalcin" To: Sent: Friday, May 30, 2008 11:37 PM Subject: Re: problem of parallel MatAssembled > Are you completelly sure that you collectively called at ALL processes > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY) > MatAssemblyEndA, MAT_FINAL_ASSEMBLY) > > Please, note the MAT_FINAL_ASSEMBLY. If you used MAT_FLUSH_ASSEMBLY at > some process, then what you get is expected. I will check it. > Not sure what you are trying to do, please elaborate a bit more. Why a > call like this > > PetscInt row = ... > PetscInt ncols = ... > PetscInt *cols_indices = ... > PetscScalar *cols_values = ... > > MatSetValues(A, 1, &row, ncols, cols_indices, cols_values, ADD_VALUES) > > is not enough for you? I am using a complex way to deal with multi-material domain. The domain is decomposed into several regions with different material. Each region has its own governing equation. As a result, I first build equation on each region, then processing region interface and boundary. Many boundary types (more than 10) exising to my problem. For a node located on the region interface, the two region both own a copy of the node, so I need to combine the governing equation of the two node. For a variable with continuous value on the interface, the sum of 2 equation should be zero and the value of two node should be equal. So I need to add some rows to other rows to the petsc vector.and give a euqation as v1-v2=0 And the equation is nonlinear, I have to evaluate Jacobian matrix, which is done by AD from the equation evaluation in each region. So I also need to add some rows of jacobian matrix to other rows. I wonder how others process multi-material problem? G.D. From keita at cray.com Fri May 30 16:04:10 2008 From: keita at cray.com (Keita Teranishi) Date: Fri, 30 May 2008 16:04:10 -0500 Subject: Support for SuperLU_Dist 2.2 Message-ID: <925346A443D4E340BEB20248BAFCDBDF05BCA9B7@CFEVS1-IP.americas.cray.com> Hi, Does PETSc support SuperLU_DIST version 2.2? I am interested in using new SuperLU through PETSc's ksp interface. Thanks, ================================ Keita Teranishi Math Software Group Cray, Inc. keita at cray.com ================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri May 30 16:16:26 2008 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 30 May 2008 16:16:26 -0500 (CDT) Subject: Support for SuperLU_Dist 2.2 In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF05BCA9B7@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF05BCA9B7@CFEVS1-IP.americas.cray.com> Message-ID: SuperLU_DIST 2.2 support appears to be in petsc-dev. Satish On Fri, 30 May 2008, Keita Teranishi wrote: > Hi, > > > > Does PETSc support SuperLU_DIST version 2.2? I am interested in using new SuperLU through PETSc's ksp interface. > > > > Thanks, > > > > ================================ > Keita Teranishi > Math Software Group > Cray, Inc. > keita at cray.com > ================================ > > From zonexo at gmail.com Fri May 30 17:33:03 2008 From: zonexo at gmail.com (Ben Tay) Date: Sat, 31 May 2008 06:33:03 +0800 Subject: Solving 2 eqns at the same time in PETSc Message-ID: <4840809F.5050705@gmail.com> Hi, (I sent this email a while ago but I use a different email address. Not sure if it got through since it's not registered in the server list. Sorry if it was resent.) I obtain 2 linear eqns (u and v velocity) from the momentum eqn in my CFD code. Instead of solving eqn 1 in parallel, and then subsequently eqn 2 in parallel, I am thinking of solving the 2 eqns at the same time, using half the number of processors on each eqn. In other words, when using 4 processors, I use 2 processors for eqn 1 and 2 processors for eqn 2. Will that be possible? I thought that in MPI, if an equation is divided among too many processors, its scaling factor will decrease. So by dividing into less processors and solving them simultaneously, it will give better performance. Is that true? I've also successfully coded 1 eqn to be solved in parallel in PETSc. What changes do I have to made now? Thank you very much. Regards. From bsmith at mcs.anl.gov Fri May 30 21:48:13 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 30 May 2008 21:48:13 -0500 Subject: Solving 2 eqns at the same time in PETSc In-Reply-To: <4840809F.5050705@gmail.com> References: <4840809F.5050705@gmail.com> Message-ID: On May 30, 2008, at 5:33 PM, Ben Tay wrote: > Hi, > > (I sent this email a while ago but I use a different email address. > Not sure if it got through since it's not registered in the server > list. Sorry if it was resent.) > > I obtain 2 linear eqns (u and v velocity) from the momentum eqn in > my CFD code. Instead of solving eqn 1 in parallel, and then > subsequently eqn 2 in parallel, I am thinking of solving the 2 eqns > at the same time, using half the number of processors on each eqn. > In other words, when using 4 processors, I use 2 processors for eqn > 1 and 2 processors for eqn 2. Will that be possible? You simply use the MPI_Group and MP_Comm commands to make MPI communicators for the subsets of processes and then construct the Vec, Mat, and KSP based on those new communicators. Barry > > > I thought that in MPI, if an equation is divided among too many > processors, its scaling factor will decrease. So by dividing into > less processors and solving them simultaneously, it will give better > performance. Is that true? > > I've also successfully coded 1 eqn to be solved in parallel in > PETSc. What changes do I have to made now? > > Thank you very much. > > Regards. > > From bsmith at mcs.anl.gov Fri May 30 22:09:17 2008 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 30 May 2008 22:09:17 -0500 Subject: SV: Slow MatSetValues In-Reply-To: <20080530123109.GB3835@brakk.ethz.ch> References: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov> <20080530123109.GB3835@brakk.ethz.ch> Message-ID: This is a serious flaw in our user interface, I'm fixing it now and our next release will vastly simplify the handling of external direct solvers and make this problem impossible. Barry On May 30, 2008, at 7:31 AM, Jed Brown wrote: > On Fri 2008-05-30 13:44, Lars Rindorf wrote: >> Hi everybody >> >> Thanks for all the suggestions and help. The problem is of a bit >> different nature. I use only direct solvers, so I give the options >> "-ksp_type preonly -pc_type lu" to make a standard LU >> factorization. This works fine without any problems. If I >> additionally set "-mat_type umfpack" to use umfpack then >> MatSetValues is very, very slow (about 50 times slower). If, as a >> test, I call MatAssemblyBegin and MatAssemblyEnd before >> MatSetValues, and only use the lu (no umfpack) then the performance >> is very similarly slow. > > I've seen this problem when preallocation information is lost by > changing the > matrix type. Try putting MatSeqAIJSetPreallocation() and/or (it > doesn't hurt to > do both) MatMPIAIJSetPreallocation() after MatSetFromOptions(). > This will > preallocate for any matrix type that inherits from these two types > (which I > think is anything you might use). > > Jed