From recrusader at gmail.com  Thu May  1 11:33:51 2008
From: recrusader at gmail.com (Yujie)
Date: Thu, 1 May 2008 09:33:51 -0700
Subject: about MatMatMult()
Message-ID: <7ff0ee010805010933k40d2fbf6q72f5558da87cece1@mail.gmail.com>

I have further checked this function.
In MatMatMult(Mat A,Mat B,MatReuse scall,PetscReal fill,Mat *C)
I am wondering
why the type of C is MATAIJ when the types of A and B are MATAIJ.
although A and B is MATAIJ, C should be dense. If C uses MATAIJ type,
it should take more memory, is it right?

thanks a lot.

Regards,
Yujie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080501/b327f602/attachment.htm>

From hzhang at mcs.anl.gov  Thu May  1 11:55:28 2008
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Thu, 1 May 2008 11:55:28 -0500 (CDT)
Subject: about MatMatMult()
In-Reply-To: <7ff0ee010805010933k40d2fbf6q72f5558da87cece1@mail.gmail.com>
References: <7ff0ee010805010933k40d2fbf6q72f5558da87cece1@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0805011146410.22168@terra.mcs.anl.gov>


Yujie,

In general, C=A*B is denser thant A and B.
Thus, sparse matrix product should be avoided.

Petsc sparse MatMatMult() is intended for supporting multigrid 
computation MatPtAP() in which, 
P is a projector and C=Pt*A*P maintains similar sparse density.

If your C=A*B is dense, you may set A and B in dense format.
In sequential case, petsc calls LAPACK for MatMatMult()
which would be much efficient than sparse implementation.

Hong

On Thu, 1 May 2008, Yujie wrote:

> I have further checked this function.
> In MatMatMult(Mat A,Mat B,MatReuse scall,PetscReal fill,Mat *C)
> I am wondering
> why the type of C is MATAIJ when the types of A and B are MATAIJ.
> although A and B is MATAIJ, C should be dense. If C uses MATAIJ type,
> it should take more memory, is it right?
>
> thanks a lot.
>
> Regards,
> Yujie
>


From recrusader at gmail.com  Thu May  1 19:08:25 2008
From: recrusader at gmail.com (Yujie)
Date: Thu, 1 May 2008 17:08:25 -0700
Subject: further about PCComputeExplicitOperator()
Message-ID: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com>

when 1 processor is used, the matrix M in PCComputeExplicitOperator(pc,&M)
uses MATSEQDENSE type. Now, I want to use MATSEQAIJ, I change the codes as
follows:
1563 if (size == 1) {
1564 //05/01/08
1565 //ierr = MatSetType(*mat,MATSEQDENSE);CHKERRQ(ierr);
1566 //ierr = MatSeqDenseSetPreallocation(*mat,PETSC_NULL);CHKERRQ(ierr);
1567 ierr = MatSetType(*mat,MATSEQAIJ);CHKERRQ(ierr);
1568 ierr = MatSeqAIJSetPreallocation(*mat,0,PETSC_NULL);CHKERRQ(ierr);
1569
1570 } else {
1571 ierr = MatSetType(*mat,MATMPIAIJ);CHKERRQ(ierr);
1572 ierr =
MatMPIAIJSetPreallocation(*mat,0,PETSC_NULL,0,PETSC_NULL);CHKERRQ(ierr);
1573 }

PCApply is fast when running. However,  MatSetValues() is very very slow
when some arraies need to set. I find that the problem likely lies in
MatSeqXAIJReallocateAIJ(A,A->rmap.n,1,nrow,row,col,rmax,aa,ai,aj,rp,ap,imax,nonew,MatScalar)
in MatSetValues_SeqAIJ() after debugging the codes.
I can't further figure out where is the problem. Because it is
difficult to debug. Could you give me some advice? thanks a lot.

the version of PETSc is 2.3.3-p8.

thanks a lot.

Regards,
Yujie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080501/5d7f3faf/attachment.htm>

From amjad11 at gmail.com  Fri May  2 11:12:20 2008
From: amjad11 at gmail.com (amjad ali)
Date: Fri, 2 May 2008 21:12:20 +0500
Subject: Selection between C2D and Xeon 3000 for PETSc Sparse solvers
In-Reply-To: <alpine.LFD.1.10.0804221023140.14086@asterix.localdomain>
References: <428810f20804220643r618753dayb3cae42b9f92b7e7@mail.gmail.com>
	 <alpine.LFD.1.10.0804221023140.14086@asterix.localdomain>
Message-ID: <428810f20805020912s6bf13502s89fc63fe6044556c@mail.gmail.com>

Hello Dr. Satish, I am still bit confused in taking decesion and wanting
some more guidence from you and PETSc users/maint.

*Question ONE:*


Please help me in selecting out one of the following two clusters, *bit
updated/changed from my previous choices.*


As I am going to make a gigabit ethernet cluster of 4 compute nodes
(totaling 8 cores), with each node having:


(Choice 1)

One Processor: Intel Core2Duo E6750 2.66GHz, FSB 1333MHz, 4MB L2.

Motherboard: Intel Desktop Board DX38BT with Intel X38 Chipset supporting
1333/1066/800 MHz system bus.

RAM: 2GB DDR3 1333MHz ECC System Memory.


(Choice 2)

One Processor: Intel Xeon 3075 2.66GHz, FSB1333, 4MBL2.

Motherboard: Intel Entry Server Board Intel S3200SHV with intel 3200 Chipset
supporting 1333/1066/800 MHz system bus.

RAM: 2GB DDR2 800MHz ECC System Memory.


Which one system has larger memory-bandwidth/CPU-core? Better for PETSc?

Any other comment/remark?

My area work deals in sparse matrices.

I near future I would like to add 12 similar compute nodes in the cluster.
So my decision should be optimum/long-lasting.


*Question TWO:*


I want to make ROCKS V cluster on Intel x86_64 machines (selected from the
above options). ROCKS V is available separately for both i386 and x86_64
machines. But I have not seen two different version of PETSc for i386 and
x86_64 machines.

Is there only a single version for PETSc for both i386 and x86_64 machines?
IF YES, then out of "ROCKS/OS i386" and "ROCKS/OS x86_64" which one is more
suitable (efficiecy/speed/performance wise) for PETSc?


Regards,

Amjad Ali.

On Tue, Apr 22, 2008 at 8:25 PM, Satish Balay <petsc-maint at mcs.anl.gov>
wrote:

>  On Tue, 22 Apr 2008, amjad ali wrote:
>
> >  Hello,
> >
> > Please help me out in selecting any one choice of the following:
> >  (Currently I am making a gigabit ethernet cluster of 4 compute nodes
> > (totaling 8 cores), with each node having)
> >
> > (Choice 1)
> > One Processor: Intel Core2Duo E6750 2.66 GHz Processor, FSB 1333MHz, 4MB
> L2.
> > Motherboard: Intel Entry Server Board Intel S3200SHV with intel 3200
> Chipset
> > supporting 1333/1066/800 MHz FSB .
> > RAM: 2GB DDR2 800MHz ECC System Memory.
> >
> > (Choice 2)
> > One Processor: Intel Xeon 3075 2.66 GHz FSB1333 4MBL2.
> >  Motherboard: Intel Entry Server Board Intel S3200SHV with intel 3200
> > Chipset supporting 1333/1066/800 MHz FSB .
> > RAM: 2GB DDR2 800MHz ECC System Memory.
> >
> > Which one system has larger memory-bandwidth/CPU-core?
> > Any other comment/remark?
> > My area work deals in sparse matrices.
> > I near future I would like to add 12 similar compute nodes in the
> cluster.
>
> Based on the above numbers - the memory bandwidth numbers should be
> the same. And I expect the performance to be the same in both cases.
>
> Ideally you would have access to both machines [perhaps from the
> vendor] - and run streams benchmark on each - to see if there is any
> difference.
>
> Satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080502/bf6c7500/attachment.htm>

From bsmith at mcs.anl.gov  Fri May  2 11:50:21 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 2 May 2008 12:50:21 -0400
Subject: further about PCComputeExplicitOperator()
In-Reply-To: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com>
References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com>
Message-ID: <E832FB77-3BF9-4F8F-88E8-8EFD1D6C2764@mcs.anl.gov>


   Application of a preconditioner is almost always a dense operator.  
The only exception is sparse
approximate inverse and in fact this is why the sparse approximate  
inverse is a lousy preconditioner.
So I don't think you would ever want to compute the preconditioner  
into a sparse matrix.

   If you want the SPAI preconditioner explicitly as a PETSc sparse  
matrix you should go into that
code, figure out how the SPAI stores its computed preconditioner and  
convert it to the PETSc
format.

   Note also:
PCComputeExplicitOperator - Computes the explicit preconditioned  
operator.
this means it computes B*A or A*B (depending on left or right  
preconditioning). This beasty
(unless you use Jacobi preconditioning) is always dense and it makes  
no sense to store in
sparse format except for fun.

    Barry


On May 1, 2008, at 8:08 PM, Yujie wrote:

> when 1 processor is used, the matrix M in  
> PCComputeExplicitOperator(pc,&M) uses MATSEQDENSE type. Now, I want  
> to use MATSEQAIJ, I change the codes as follows:
> 1563 if (size == 1) {
> 1564 //05/01/08
> 1565 //ierr = MatSetType(*mat,MATSEQDENSE);CHKERRQ(ierr);
> 1566 //ierr =  
> MatSeqDenseSetPreallocation(*mat,PETSC_NULL);CHKERRQ(ierr);
> 1567 ierr = MatSetType(*mat,MATSEQAIJ);CHKERRQ(ierr);
> 1568 ierr = MatSeqAIJSetPreallocation(*mat, 
> 0,PETSC_NULL);CHKERRQ(ierr);
> 1569
> 1570 } else {
> 1571 ierr = MatSetType(*mat,MATMPIAIJ);CHKERRQ(ierr);
> 1572 ierr = MatMPIAIJSetPreallocation(*mat,0,PETSC_NULL, 
> 0,PETSC_NULL);CHKERRQ(ierr);
> 1573 }
>
> PCApply is fast when running. However,  MatSetValues() is very very  
> slow when some arraies need to set. I find that the problem likely  
> lies in
> MatSeqXAIJReallocateAIJ(A,A->rmap.n, 
> 1,nrow,row,col,rmax,aa,ai,aj,rp,ap,imax,nonew,MatScalar) in  
> MatSetValues_SeqAIJ() after debugging the codes.
> I can't further figure out where is the problem. Because it is  
> difficult to debug. Could you give me some advice? thanks a lot.
>
> the version of PETSc is 2.3.3-p8.
>
> thanks a lot.
>
> Regards,
> Yujie
>
>
>
>


From dave.mayhem23 at gmail.com  Fri May  2 18:19:00 2008
From: dave.mayhem23 at gmail.com (Dave May)
Date: Sat, 3 May 2008 09:19:00 +1000
Subject: further about PCComputeExplicitOperator()
In-Reply-To: <E832FB77-3BF9-4F8F-88E8-8EFD1D6C2764@mcs.anl.gov>
References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com>
	 <E832FB77-3BF9-4F8F-88E8-8EFD1D6C2764@mcs.anl.gov>
Message-ID: <956373f0805021619y239b5caeh9330477f2f9fcd7e@mail.gmail.com>

Hi Barry,
   Does PCComputeExplicitOperator() really build B*A or A*B?
The above code appears to just be applying the preconditioner to the each
column of the identity matrix, with no reference to the original operator or
the preconditioner side.

I only wanted to clarify this fact as at one stage I wrote a function to
compute B*A or A*B as I had convinced myself that
PCComputeExplicitOperator() just assembled the inverse of the
preconditioner.

Cheers,
    Dave.


On Sat, May 3, 2008 at 2:50 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>  Note also:
> PCComputeExplicitOperator - Computes the explicit preconditioned operator.
> this means it computes B*A or A*B (depending on left or right
> preconditioning). This beasty
> (unless you use Jacobi preconditioning) is always dense and it makes no
> sense to store in
> sparse format except for fun.
>
>   Barry
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080503/471d78d6/attachment.htm>

From recrusader at gmail.com  Fri May  2 18:26:15 2008
From: recrusader at gmail.com (Yujie)
Date: Fri, 2 May 2008 16:26:15 -0700
Subject: further about PCComputeExplicitOperator()
In-Reply-To: <E832FB77-3BF9-4F8F-88E8-8EFD1D6C2764@mcs.anl.gov>
References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com>
	 <E832FB77-3BF9-4F8F-88E8-8EFD1D6C2764@mcs.anl.gov>
Message-ID: <7ff0ee010805021626r33afdd02x7db87d6c05a6b521@mail.gmail.com>

Dear Barry:

 "Note also:
PCComputeExplicitOperator - Computes the explicit preconditioned operator.
this means it computes B*A or A*B (depending on left or right
preconditioning). This beasty
(unless you use Jacobi preconditioning) is always dense and it makes no
sense to store in
sparse format except for fun."

Regarding the above comments, I have checked the array of the explicit
preconditioner obtained in PCComputeExplicitOperator(). It is sparse. Why
say it is dense?
thanks a lot.

Regards,
Yujie

On Fri, May 2, 2008 at 9:50 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>  Application of a preconditioner is almost always a dense operator. The
> only exception is sparse
> approximate inverse and in fact this is why the sparse approximate inverse
> is a lousy preconditioner.
> So I don't think you would ever want to compute the preconditioner into a
> sparse matrix.
>
>  If you want the SPAI preconditioner explicitly as a PETSc sparse matrix
> you should go into that
> code, figure out how the SPAI stores its computed preconditioner and
> convert it to the PETSc
> format.
>
>  Note also:
> PCComputeExplicitOperator - Computes the explicit preconditioned operator.
> this means it computes B*A or A*B (depending on left or right
> preconditioning). This beasty
> (unless you use Jacobi preconditioning) is always dense and it makes no
> sense to store in
> sparse format except for fun.
>
>   Barry
>
>
>
> On May 1, 2008, at 8:08 PM, Yujie wrote:
>
> when 1 processor is used, the matrix M in PCComputeExplicitOperator(pc,&M)
> > uses MATSEQDENSE type. Now, I want to use MATSEQAIJ, I change the codes as
> > follows:
> > 1563 if (size == 1) {
> > 1564 //05/01/08
> > 1565 //ierr = MatSetType(*mat,MATSEQDENSE);CHKERRQ(ierr);
> > 1566 //ierr =
> > MatSeqDenseSetPreallocation(*mat,PETSC_NULL);CHKERRQ(ierr);
> > 1567 ierr = MatSetType(*mat,MATSEQAIJ);CHKERRQ(ierr);
> > 1568 ierr = MatSeqAIJSetPreallocation(*mat,0,PETSC_NULL);CHKERRQ(ierr);
> > 1569
> > 1570 } else {
> > 1571 ierr = MatSetType(*mat,MATMPIAIJ);CHKERRQ(ierr);
> > 1572 ierr =
> > MatMPIAIJSetPreallocation(*mat,0,PETSC_NULL,0,PETSC_NULL);CHKERRQ(ierr);
> > 1573 }
> >
> > PCApply is fast when running. However,  MatSetValues() is very very slow
> > when some arraies need to set. I find that the problem likely lies in
> > MatSeqXAIJReallocateAIJ(A,A->rmap.n,1,nrow,row,col,rmax,aa,ai,aj,rp,ap,imax,nonew,MatScalar)
> > in MatSetValues_SeqAIJ() after debugging the codes.
> > I can't further figure out where is the problem. Because it is difficult
> > to debug. Could you give me some advice? thanks a lot.
> >
> > the version of PETSc is 2.3.3-p8.
> >
> > thanks a lot.
> >
> > Regards,
> > Yujie
> >
> >
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080502/239f4147/attachment.htm>

From bsmith at mcs.anl.gov  Sat May  3 14:42:59 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 3 May 2008 15:42:59 -0400
Subject: further about PCComputeExplicitOperator()
In-Reply-To: <956373f0805021619y239b5caeh9330477f2f9fcd7e@mail.gmail.com>
References: <7ff0ee010805011708m3e6c85c2l6c4b196698c39abf@mail.gmail.com> <E832FB77-3BF9-4F8F-88E8-8EFD1D6C2764@mcs.anl.gov> <956373f0805021619y239b5caeh9330477f2f9fcd7e@mail.gmail.com>
Message-ID: <7E0A3675-0B00-49D3-B50F-7E64E07F5144@mcs.anl.gov>


On May 2, 2008, at 7:19 PM, Dave May wrote:

> Hi Barry,
>    Does PCComputeExplicitOperator() really build B*A or A*B?
> The above code appears to just be applying the preconditioner to the  
> each column of the identity matrix, with no reference to the  
> original operator or the preconditioner side.

   Dave,

     You are correct; I apologize. There is a  
KSPComputeExplicitOperator() that computes the preconditioned operator  
(B*A or A*B) depends on the KSP preconditioner
sider. The manual page for PCComputeExplicitOperator() does not  
mention KSPComputeExplicitOperator(), I have rectified this.

    Barry


>
>
> I only wanted to clarify this fact as at one stage I wrote a  
> function to compute B*A or A*B as I had convinced myself that  
> PCComputeExplicitOperator() just assembled the inverse of the  
> preconditioner.
>
> Cheers,
>     Dave.
>
>
>
> On Sat, May 3, 2008 at 2:50 AM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>
>  Note also:
> PCComputeExplicitOperator - Computes the explicit preconditioned  
> operator.
> this means it computes B*A or A*B (depending on left or right  
> preconditioning). This beasty
> (unless you use Jacobi preconditioning) is always dense and it makes  
> no sense to store in
> sparse format except for fun.
>
>   Barry
>


From keita at cray.com  Mon May  5 12:20:25 2008
From: keita at cray.com (Keita Teranishi)
Date: Mon, 5 May 2008 12:20:25 -0500
Subject: C++ and Fortran Support
Message-ID: <925346A443D4E340BEB20248BAFCDBDF05516D4A@CFEVS1-IP.americas.cray.com>

Hi,

 
Just a quick question.  Can PETSc  support both C++ and Fortran interface together with a single bmake? 

 
Thanks,

================================
 Keita Teranishi
 Math Software Group
 Cray, Inc.
 keita at cray.com
================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080505/fe11c40b/attachment.htm>

From balay at mcs.anl.gov  Mon May  5 12:50:29 2008
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 5 May 2008 12:50:29 -0500 (CDT)
Subject: C++ and Fortran Support
In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF05516D4A@CFEVS1-IP.americas.cray.com>
References: <925346A443D4E340BEB20248BAFCDBDF05516D4A@CFEVS1-IP.americas.cray.com>
Message-ID: <alpine.LFD.1.10.0805051249170.16016@asterix.localdomain>

On Mon, 5 May 2008, Keita Teranishi wrote:

> Hi,
> 
>  
> 
> Just a quick question.  Can PETSc  support both C++ and Fortran interface together with a single bmake? 


yes. the fortran interface will always be built [as long as
'--with-fc=' is specified] - irrespective of -with-clanguage=c [or
cxx]

Satish


From jagruti.trivedi at navy.mil  Mon May  5 13:10:51 2008
From: jagruti.trivedi at navy.mil (Trivedi, Jagruti CIV 470000D, 474300D)
Date: Mon, 5 May 2008 11:10:51 -0700
Subject: Unsubscribe me
Message-ID: <F87AED792A4446479B34D0D91D597FA5B01361@nawechlkez02.nadsuswe.nads.navy.mil>


Unsubscribe me from email alias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080505/ee0d6f82/attachment.htm>

From knepley at gmail.com  Mon May  5 13:14:30 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 5 May 2008 13:14:30 -0500
Subject: Unsubscribe me
In-Reply-To: <F87AED792A4446479B34D0D91D597FA5B01361@nawechlkez02.nadsuswe.nads.navy.mil>
References: <F87AED792A4446479B34D0D91D597FA5B01361@nawechlkez02.nadsuswe.nads.navy.mil>
Message-ID: <a9f269830805051114w6cbcd2b0vb645890b985bdddf@mail.gmail.com>

On Mon, May 5, 2008 at 1:10 PM, Trivedi, Jagruti CIV 470000D, 474300D
<jagruti.trivedi at navy.mil> wrote:
>
> Unsubscribe me from email alias

You can actually unsubscribe yourself from petsc-users by sending a
mail to majordomo.

  Matt
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From stephane.aubert at fluorem.com  Tue May  6 12:20:07 2008
From: stephane.aubert at fluorem.com (Stephane Aubert)
Date: Tue, 06 May 2008 19:20:07 +0200
Subject: How to implement a block version of MatDiagonalScale?
Message-ID: <48209347.1060500@fluorem.com>

Hello,
I'm "playing" with badly conditioned block matrices (MPIBAIJ type) 
arising from turbulent compressible fluid dynamics (RANS eqs, block size 
= 5 (laminar) or 7 (turbulent)).
I tested that using bi-normalization approach to build l and r vectors, 
then calling MatDiagonalScale(A,l,r), improve the accuracy and the 
convergence of GMRES+ILU(0), basically by reducing the dependence to the 
mesh cells size and to the variables magnitudes (in particular for k-w 
turbulence model).
Now, I would like to go further and use L and R block diagonal matrices, 
instead of vectors, for example to get block identity along the diagonal 
of L.A.R. The sparcity of A is preserved, from a block point of view.
My first idea is to use a LU factorization of Aii, with L=Li^(-1), 
R=Ui^(-1), Aii=Li.Ui.

My question is: How to compute L and R using PETSC available functions 
in a clever way, instead of calling some LAPACK functions after getting 
individual diagonal blocks in my own piece of code?

My actual guess is:

   1. Create a new matrix D with MPIBDIAG type, from the block-diagonal
      of A using MatGetSubMatrices(), to do something like MatGetDiagonal().
   2. Create a PC of type PCLU using D as operators, with the sequence
      PCCreate/PCSetType/PCSetOperators/PCSetUp.
   3. Get Li and Ui. This is where I'm glued: I can't figure out how to
      use PCGetFactoredMatrix() to get Li and Ui... Is MatLUFactor() a
      better candidate?
   4. Do I need to compute Li^(-1) and Ui^(-1), or is there some
      "backsubstitution" functions available?

With many thanks for your suggestions,
Stef.

-- 
___________________________________________________________
Dr. Stephane AUBERT, CEO & CTO
FLUOREM s.a.s
Centre Scientifique Auguste MOIROUX
64 chemin des MOUILLES
F-69130 ECULLY, FRANCE
International:      fax: +33 4.78.33.99.39      tel: +33 4.78.33.99.35
France:             fax: 04.78.33.99.39         tel: 04.78.33.99.35
email: stephane.aubert at fluorem.com
web: www.fluorem.com


From Amit.Itagi at seagate.com  Tue May  6 12:53:10 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Tue, 6 May 2008 13:53:10 -0400
Subject: DA question
In-Reply-To: <1D003C34-5E65-4340-98EC-8274AA32BA16@mcs.anl.gov>
Message-ID: <OFE7996CA3.C92527E0-ON85257441.00617BAE-85257441.00630839@seagate.com>


> > One question : How compatible is PetSc with Blitz++ ? Can I declare
> > the
> > array to be returned by DAVecGetArray to be a Blitz array ?
>
>    Likely you would need to use VecGetArray() and then somehow build
> the Blitz
> array using the pointer returned and the sizes of the local part of
> the DA.
>
>    If you figure out how to do this then maybe we could have a
> DAVecGetArrayBlitz()
>
>     Barry
>


Barry,

I did some more thinking about this. If I have a standard C array (any
dimension) that is stored in a contiguous block of memory (with regular
ordering), there is a Blitz constructor that can convert it to a Blitz
array.

I took a look at the source of DAVecGetArray. The array generation depends
(in the VecGetArray3d source) on VecGetArray, and a short code that
allocates storage to store the pointers and to do the pointer assignments
to appropriate parts of VecGetArray. Looking at the assignments, it looks
like the 3D array returned by DAVecGetArray has the contiguous, regularly
ordered storage format that Blitz expects. Is this correct ?

Thanks

Rgds,
Amit


From knepley at gmail.com  Tue May  6 14:13:13 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 6 May 2008 14:13:13 -0500
Subject: DA question
In-Reply-To: <OFE7996CA3.C92527E0-ON85257441.00617BAE-85257441.00630839@seagate.com>
References: <1D003C34-5E65-4340-98EC-8274AA32BA16@mcs.anl.gov>
	 <OFE7996CA3.C92527E0-ON85257441.00617BAE-85257441.00630839@seagate.com>
Message-ID: <a9f269830805061213k2501fd37u350aee8ce19782e1@mail.gmail.com>

On Tue, May 6, 2008 at 12:53 PM,  <Amit.Itagi at seagate.com> wrote:
>  > > One question : How compatible is PetSc with Blitz++ ? Can I declare
>  > > the
>  > > array to be returned by DAVecGetArray to be a Blitz array ?
>  >
>  >    Likely you would need to use VecGetArray() and then somehow build
>  > the Blitz
>  > array using the pointer returned and the sizes of the local part of
>  > the DA.
>  >
>  >    If you figure out how to do this then maybe we could have a
>  > DAVecGetArrayBlitz()
>  >
>  >     Barry
>  >
>
>
>  Barry,
>
>  I did some more thinking about this. If I have a standard C array (any
>  dimension) that is stored in a contiguous block of memory (with regular
>  ordering), there is a Blitz constructor that can convert it to a Blitz
>  array.
>
>  I took a look at the source of DAVecGetArray. The array generation depends
>  (in the VecGetArray3d source) on VecGetArray, and a short code that
>  allocates storage to store the pointers and to do the pointer assignments
>  to appropriate parts of VecGetArray. Looking at the assignments, it looks
>  like the 3D array returned by DAVecGetArray has the contiguous, regularly
>  ordered storage format that Blitz expects. Is this correct ?

All DA storage is just contiguous blocks of memory, like a PETSc Vec.

  Matt

>  Thanks
>
>  Rgds,
>  Amit
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From bsmith at mcs.anl.gov  Tue May  6 15:47:37 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 6 May 2008 15:47:37 -0500
Subject: DA question
In-Reply-To: <OFE7996CA3.C92527E0-ON85257441.00617BAE-85257441.00630839@seagate.com>
References: <OFE7996CA3.C92527E0-ON85257441.00617BAE-85257441.00630839@seagate.com>
Message-ID: <8C9938D0-AE7F-4EE2-874E-E42285797634@mcs.anl.gov>


On May 6, 2008, at 12:53 PM, Amit.Itagi at seagate.com wrote:

>
>
>
>>> One question : How compatible is PetSc with Blitz++ ? Can I declare
>>> the
>>> array to be returned by DAVecGetArray to be a Blitz array ?
>>
>>   Likely you would need to use VecGetArray() and then somehow build
>> the Blitz
>> array using the pointer returned and the sizes of the local part of
>> the DA.
>>
>>   If you figure out how to do this then maybe we could have a
>> DAVecGetArrayBlitz()
>>
>>    Barry
>>
>
>
> Barry,
>
> I did some more thinking about this. If I have a standard C array (any
> dimension) that is stored in a contiguous block of memory (with  
> regular
> ordering), there is a Blitz constructor that can convert it to a Blitz
> array.
>
> I took a look at the source of DAVecGetArray. The array generation  
> depends
> (in the VecGetArray3d source) on VecGetArray, and a short code that
> allocates storage to store the pointers and to do the pointer  
> assignments
> to appropriate parts of VecGetArray. Looking at the assignments, it  
> looks
> like the 3D array returned by DAVecGetArray has the contiguous,  
> regularly
> ordered storage format that Blitz expects. Is this correct ?

    The value returned by VecGetArray() returns a simple contiguous,  
regularly
>
> ordered storage format that Blitz expects. I highly recommend you  
> simply
call the VecGetArray() and then the Blitz constructor; there is  
absolutely no reason
to use the 3d array returned by DAVecGetArray() for this.

    Barry

>
>
> Thanks
>
> Rgds,
> Amit
>


From bsmith at mcs.anl.gov  Tue May  6 15:57:35 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 6 May 2008 15:57:35 -0500
Subject: How to implement a block version of MatDiagonalScale?
In-Reply-To: <48209347.1060500@fluorem.com>
References: <48209347.1060500@fluorem.com>
Message-ID: <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov>


   Stef,

     You do not want to do it either way you have outlined below.  
Since the blocks are of size 5 or 7 you DO NOT EVER IN A BILLION YEARS  
WANT
to use LAPACK/BLAS to do the little factorizations and solves. The  
overhead of the LAPACK/BLAS will kill performance for that size  
array.  Similarly
using the PETSc Mat objects  and PC objects are not suitable for those  
tiny matrices.

    Here is what I would do. Take a look at the function  
MatLUFactorNumeric_SeqBAIJ_5() in the file src/mat/impls/baij/seq/ 
baijfact9.c Note it uses
Kernel_A_gets_inverse_A_5() to invert the little 5 by 5 blocks. Also  
it uses inlined code to do the little 5 by 5 matrix matrix  
multiplies.  (Yes for this
size problem you do want to invert the little 5 by 5 matrices and do  
matrix matrix products instead of only doing 5 by 5 LU factorizations  
and lots
of triangular solves; it is much faster this way.) I think by looking  
at this subroutine you can figure out how to loop over the nonzero  
blocks of A
and multiply by the appropriate diagonal blocks you have inverted to  
obtain the new block diagonal scaled A matrix.

    Barry

We would like to include the resulting code in PETSc if you would like  
to donate it. Thanks


On May 6, 2008, at 12:20 PM, Stephane Aubert wrote:

> Hello,
> I'm "playing" with badly conditioned block matrices (MPIBAIJ type)  
> arising from turbulent compressible fluid dynamics (RANS eqs, block  
> size = 5 (laminar) or 7 (turbulent)).
> I tested that using bi-normalization approach to build l and r  
> vectors, then calling MatDiagonalScale(A,l,r), improve the accuracy  
> and the convergence of GMRES+ILU(0), basically by reducing the  
> dependence to the mesh cells size and to the variables magnitudes  
> (in particular for k-w turbulence model).
> Now, I would like to go further and use L and R block diagonal  
> matrices, instead of vectors, for example to get block identity  
> along the diagonal of L.A.R. The sparcity of A is preserved, from a  
> block point of view.
> My first idea is to use a LU factorization of Aii, with L=Li^(-1),  
> R=Ui^(-1), Aii=Li.Ui.
>
> My question is: How to compute L and R using PETSC available  
> functions in a clever way, instead of calling some LAPACK functions  
> after getting individual diagonal blocks in my own piece of code?
>
> My actual guess is:
>
>  1. Create a new matrix D with MPIBDIAG type, from the block-diagonal
>     of A using MatGetSubMatrices(), to do something like  
> MatGetDiagonal().
>  2. Create a PC of type PCLU using D as operators, with the sequence
>     PCCreate/PCSetType/PCSetOperators/PCSetUp.
>  3. Get Li and Ui. This is where I'm glued: I can't figure out how to
>     use PCGetFactoredMatrix() to get Li and Ui... Is MatLUFactor() a
>     better candidate?
>  4. Do I need to compute Li^(-1) and Ui^(-1), or is there some
>     "backsubstitution" functions available?
>
> With many thanks for your suggestions,
> Stef.

>
>
> -- 
> ___________________________________________________________
> Dr. Stephane AUBERT, CEO & CTO
> FLUOREM s.a.s
> Centre Scientifique Auguste MOIROUX
> 64 chemin des MOUILLES
> F-69130 ECULLY, FRANCE
> International:      fax: +33 4.78.33.99.39      tel: +33 4.78.33.99.35
> France:             fax: 04.78.33.99.39         tel: 04.78.33.99.35
> email: stephane.aubert at fluorem.com
> web: www.fluorem.com
>
>


From Amit.Itagi at seagate.com  Tue May  6 16:04:03 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Tue, 6 May 2008 17:04:03 -0400
Subject: DA question
In-Reply-To: <8C9938D0-AE7F-4EE2-874E-E42285797634@mcs.anl.gov>
Message-ID: <OF9770C15C.4BAE3574-ON85257441.0073AD5B-85257441.00748214@seagate.com>

Barry,

You are right. The VecGetArray followed by the Blitz constructor works
fine.

Thanks

Rgds,
Amit

owner-petsc-users at mcs.anl.gov wrote on 05/06/2008 04:47:37 PM:

>
> On May 6, 2008, at 12:53 PM, Amit.Itagi at seagate.com wrote:
>
> >
> >
> >
> >>> One question : How compatible is PetSc with Blitz++ ? Can I declare
> >>> the
> >>> array to be returned by DAVecGetArray to be a Blitz array ?
> >>
> >>   Likely you would need to use VecGetArray() and then somehow build
> >> the Blitz
> >> array using the pointer returned and the sizes of the local part of
> >> the DA.
> >>
> >>   If you figure out how to do this then maybe we could have a
> >> DAVecGetArrayBlitz()
> >>
> >>    Barry
> >>
> >
> >
> > Barry,
> >
> > I did some more thinking about this. If I have a standard C array (any
> > dimension) that is stored in a contiguous block of memory (with
> > regular
> > ordering), there is a Blitz constructor that can convert it to a Blitz
> > array.
> >
> > I took a look at the source of DAVecGetArray. The array generation
> > depends
> > (in the VecGetArray3d source) on VecGetArray, and a short code that
> > allocates storage to store the pointers and to do the pointer
> > assignments
> > to appropriate parts of VecGetArray. Looking at the assignments, it
> > looks
> > like the 3D array returned by DAVecGetArray has the contiguous,
> > regularly
> > ordered storage format that Blitz expects. Is this correct ?
>
>     The value returned by VecGetArray() returns a simple contiguous,
> regularly
> >
> > ordered storage format that Blitz expects. I highly recommend you
> > simply
> call the VecGetArray() and then the Blitz constructor; there is
> absolutely no reason
> to use the 3d array returned by DAVecGetArray() for this.
>
>     Barry
>
> >
> >
> > Thanks
> >
> > Rgds,
> > Amit
> >
>


From matthew.gross1 at navy.mil  Wed May  7 14:02:26 2008
From: matthew.gross1 at navy.mil (Gross, Matthew CIV NAVAIR, 474200D)
Date: Wed, 7 May 2008 15:02:26 -0400
Subject: Unsubscribe me
In-Reply-To: <F87AED792A4446479B34D0D91D597FA5B01361@nawechlkez02.nadsuswe.nads.navy.mil>
References: <F87AED792A4446479B34D0D91D597FA5B01361@nawechlkez02.nadsuswe.nads.navy.mil>
Message-ID: <AB5C4CECD6FC8C4DAF0B3D198D76A4B5419D43@naeachrlez01.nadsusea.nads.navy.mil>


Unsubscribe me from email alias 


From neckel at in.tum.de  Thu May  8 09:07:32 2008
From: neckel at in.tum.de (Tobias Neckel)
Date: Thu, 08 May 2008 16:07:32 +0200
Subject: Question on matrix preallocation
Message-ID: <48230924.7040600@in.tum.de>

Hello,

when using petsc (version 2.3.2 on a linux 32bit Intel architecture) to 
set up a serial sparse linear system of equations, I recently noticed 
the well-known allocation performance problem: The matrix setup needs 
more memory than preallocated with a fixed number of column entries for 
all rows.
Thus, I switched to the strategy described in the Users Manual (first 
counting the number of matrix entries for each row individually and then 
  using the nnz parameter in MatCreateSeqAIJ()). But this did not change 
the dynamic allocation behaviour at all.

Therefore, I tried to bring everything down to a (very) small test 
example. I set up a nnz-1D-array of type int and length 4 which holds 
the number of expected non-zero column entries for each row of a matrix 
(in particular 2 columns in row 0). Using this nnz-array, I create a 4x4 
matrix. Afterwards, I set the entry (0,0) of the matrix to a non-zero 
value.
The source code part for this simple test can be found in the attached 
file testMatPreallocation.cpp.

When I run this test (with the additional -info runtime option), the one 
and only matrix entry setting results in an additional memory allocation
(see attached file commandLineOutput.txt)!

This is quite surprising, as I would have expected enough preallocated 
memory for the matrix, which is also visible from the output. Am I 
misusing or missing something necessary to make the preallocation work?

Thanks in advance for any hints,
best regards
Tobias Neckel

-- 
Dipl.-Tech. Math. Tobias Neckel

Institut f?r Informatik V, TU M?nchen
Boltzmannstr. 3, 85748 Garching

Tel.:   089/289-18602
Email:  neckel at in.tum.de
URL:    http://www5.in.tum.de/persons/neckel.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: commandLineOutput.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080508/f84c74ad/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testMatPreallocation.cpp
Type: text/x-c++src
Size: 1693 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080508/f84c74ad/attachment.cpp>

From knepley at gmail.com  Thu May  8 09:43:19 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 8 May 2008 09:43:19 -0500
Subject: Question on matrix preallocation
In-Reply-To: <48230924.7040600@in.tum.de>
References: <48230924.7040600@in.tum.de>
Message-ID: <a9f269830805080743h1f8ce70di54392c9630591b03@mail.gmail.com>

You are assembling before inserting values. This wipes out the preallocation
information since assembly shrinks the matrix to an optimal size.

  Matt

On Thu, May 8, 2008 at 9:07 AM, Tobias Neckel <neckel at in.tum.de> wrote:
> Hello,
>
>  when using petsc (version 2.3.2 on a linux 32bit Intel architecture) to set
> up a serial sparse linear system of equations, I recently noticed the
> well-known allocation performance problem: The matrix setup needs more
> memory than preallocated with a fixed number of column entries for all rows.
>  Thus, I switched to the strategy described in the Users Manual (first
> counting the number of matrix entries for each row individually and then
> using the nnz parameter in MatCreateSeqAIJ()). But this did not change the
> dynamic allocation behaviour at all.
>
>  Therefore, I tried to bring everything down to a (very) small test example.
> I set up a nnz-1D-array of type int and length 4 which holds the number of
> expected non-zero column entries for each row of a matrix (in particular 2
> columns in row 0). Using this nnz-array, I create a 4x4 matrix. Afterwards,
> I set the entry (0,0) of the matrix to a non-zero value.
>  The source code part for this simple test can be found in the attached file
> testMatPreallocation.cpp.
>
>  When I run this test (with the additional -info runtime option), the one
> and only matrix entry setting results in an additional memory allocation
>  (see attached file commandLineOutput.txt)!
>
>  This is quite surprising, as I would have expected enough preallocated
> memory for the matrix, which is also visible from the output. Am I misusing
> or missing something necessary to make the preallocation work?
>
>  Thanks in advance for any hints,
>  best regards
>  Tobias Neckel
>
>  --
>  Dipl.-Tech. Math. Tobias Neckel
>
>  Institut f?r Informatik V, TU M?nchen
>  Boltzmannstr. 3, 85748 Garching
>
>  Tel.:   089/289-18602
>  Email:  neckel at in.tum.de
>  URL:    http://www5.in.tum.de/persons/neckel.html
>
>  14:33:53 debug    petsc::PETScLibTest::testMatPreallocation()
> start PETSc mat preallocation test
>  [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
> max tags = 2147483647
>  [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4 X 4; storage space: 10
> unneeded,0 used
>  [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>  [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
>  [0] Mat_CheckInode(): Found 1 nodes of 4. Limit used: 5. Using Inode
> routines
>  [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4 X 4; storage space: 14
> unneeded,1 used
>  [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 1
>  [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>  [0] Mat_CheckInode(): Found 2 nodes of 4. Limit used: 5. Using Inode
> routines
>   14:33:53 debug    petsc::PETScLibTest::testMatPreallocation()
> stop PETSc mat preallocation test
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From w_subber at yahoo.com  Thu May  8 13:58:03 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Thu, 8 May 2008 11:58:03 -0700 (PDT)
Subject: Cannot use PETSC_DEFAULT in MatMatMult
Message-ID: <132575.84711.qm@web38205.mail.mud.yahoo.com>

Hi,

I am trying to multiply to spares matrices. I am using PETSC_DEFAULT for the fill ratio; however, I cannot compile the code I get the following error message :

Error: master.F, line 55: This name does not have a type, and must have an explicit type.   [PETSC_DEFAULT]
       call MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,C,ierr) 
----------------------------------------------------------^
compilation aborted for master.F (code 1)

Thanks 

Waad


---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080508/8f534915/attachment.htm>

From bsmith at mcs.anl.gov  Thu May  8 14:15:19 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 8 May 2008 14:15:19 -0500
Subject: Cannot use PETSC_DEFAULT in MatMatMult
In-Reply-To: <132575.84711.qm@web38205.mail.mud.yahoo.com>
References: <132575.84711.qm@web38205.mail.mud.yahoo.com>
Message-ID: <4388AB93-EABA-434E-B5EE-5A6682804D46@mcs.anl.gov>


     In Fortran you must use PETSC_DEFAULT_DOUBLE_PRECISION.

     I'll make sure this is may clear in the docs

    Barry

    There is also a PETSC_DEFAULT_INTEGER for Fortran

On May 8, 2008, at 1:58 PM, Waad Subber wrote:

> Hi,
>
> I am trying to multiply to spares matrices. I am using PETSC_DEFAULT  
> for the fill ratio; however, I cannot compile the code I get the  
> following error message :
>
> Error: master.F, line 55: This name does not have a type, and must  
> have an explicit type.   [PETSC_DEFAULT]
>        call MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,C,ierr)
> ----------------------------------------------------------^
> compilation aborted for master.F (code 1)
>
> Thanks
>
> Waad
>
>
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  
> Try it now.


From griffith at cims.nyu.edu  Thu May  8 18:00:07 2008
From: griffith at cims.nyu.edu (Boyce Griffith)
Date: Thu, 08 May 2008 19:00:07 -0400
Subject: VecMultiVec?
Message-ID: <482385F7.3040902@cims.nyu.edu>

Hi, Folks --

I'm pretty sure this has been discussed in the list previously, but I'm 
having trouble digging up the thread in the archive, so apologies in 
advance...

I need to solve some nonlinear equations that involve both PETSc Vecs as 
well as data that is stored in a non-PETSc-native format, and I was 
wondering if someone happens to have a freely available implementation 
of a "VecMultiVec" --- a vector which contains multiple Vec objects.  I 
think it should be fairly straightforward to implement such a beast, but 
I thought I'd ask before doing it myself.

Thanks,

-- Boyce


From zonexo at gmail.com  Fri May  9 07:33:21 2008
From: zonexo at gmail.com (Ben Tay)
Date: Fri, 09 May 2008 20:33:21 +0800
Subject: How to efficiently change just the diagonal vector in a matrix at
 every time step
Message-ID: <48244491.2030303@gmail.com>

Hi,
 
I have a matrix and I inserted all the relevant values during the 1st 
step. I'll then solve it. For the subsequent steps, I only need to 
change the diagonal vector of the matrix before solving. I wonder how I 
can do it efficiently. Of course, the RHS vector also change but I've 
not included them here.

I set these at the 1st step:

call KSPSetOperators(ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr)

call KSPGetPC(ksp_semi_x,pc_semi_x,ierr)

    ksptype=KSPRICHARDSON

    call KSPSetType(ksp_semi_x,ksptype,ierr)

    ptype = PCILU

    call PCSetType(pc_semi_x,ptype,ierr)

    call KSPSetFromOptions(ksp_semi_x,ierr)

    call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr)

    tol=1.e-5

    call 
KSPSetTolerances(ksp_semi_x,tol,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)

 
and what I did at the subsequent steps is:

do II=1,total
 
    call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr)

end do

call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)

call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)

call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr)

I realise that the answers are slightly different as compared to calling 
all the options such as KSPSetType, KSPSetFromOptions, KSPSetTolerances 
at every time step. Should that be so? Is this the best way?

Also, I can let the matrix be equal at every time step by fixing the 
delta_time. However, it may give stability problems. I wonder how 
expensive is these type of value changing and assembly for a matrix?

Thank you very much.

Regards.


From sdettrick at gmail.com  Fri May  9 07:50:30 2008
From: sdettrick at gmail.com (Sean Dettrick)
Date: Fri, 9 May 2008 14:50:30 +0200
Subject: How to efficiently change just the diagonal vector in a matrix at every time step
In-Reply-To: <48244491.2030303@gmail.com>
References: <48244491.2030303@gmail.com>
Message-ID: <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com>


One way to do it is to have two Mats, A and B, and a Vec, D, to store  
the diagonal.  A is constructed only on the first step.  On subsequent  
steps, A is copied into B, and then D is added to the diagonal:

   ierr = MatCopy( A, B, SAME_NON_ZERO_PATTERN );
   ierr = MatDiagonalSet( B, D, ADD_VALUES );

The KSP uses B as the matrix, not A.

I don't know if this approach is efficient or not.  Can anybody comment?

Thanks,
Sean


On May 9, 2008, at 2:33 PM, Ben Tay wrote:

> Hi,
> I have a matrix and I inserted all the relevant values during the  
> 1st step. I'll then solve it. For the subsequent steps, I only need  
> to change the diagonal vector of the matrix before solving. I wonder  
> how I can do it efficiently. Of course, the RHS vector also change  
> but I've not included them here.
>
> I set these at the 1st step:
>
> call  
> KSPSetOperators 
> (ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr)
>
> call KSPGetPC(ksp_semi_x,pc_semi_x,ierr)
>
>   ksptype=KSPRICHARDSON
>
>   call KSPSetType(ksp_semi_x,ksptype,ierr)
>
>   ptype = PCILU
>
>   call PCSetType(pc_semi_x,ptype,ierr)
>
>   call KSPSetFromOptions(ksp_semi_x,ierr)
>
>   call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr)
>
>   tol=1.e-5
>
>   call  
> KSPSetTolerances 
> (ksp_semi_x 
> ,tol 
> ,PETSC_DEFAULT_DOUBLE_PRECISION 
> ,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)
>
> and what I did at the subsequent steps is:
>
> do II=1,total
>   call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr)
>
> end do
>
> call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>
> call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>
> call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr)
>
> I realise that the answers are slightly different as compared to  
> calling all the options such as KSPSetType, KSPSetFromOptions,  
> KSPSetTolerances at every time step. Should that be so? Is this the  
> best way?
>
> Also, I can let the matrix be equal at every time step by fixing the  
> delta_time. However, it may give stability problems. I wonder how  
> expensive is these type of value changing and assembly for a matrix?
>
> Thank you very much.
>
> Regards.
>


From bsmith at mcs.anl.gov  Fri May  9 10:48:22 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 May 2008 10:48:22 -0500
Subject: How to efficiently change just the diagonal vector in a matrix at every time step
In-Reply-To: <48244491.2030303@gmail.com>
References: <48244491.2030303@gmail.com>
Message-ID: <1FCC7926-A348-4191-8683-DF3E23BB15B3@mcs.anl.gov>


>
> and what I did at the subsequent steps is:
>
> do II=1,total
>   call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr)
>
> end do
>
> call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>
> call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)


    You can/should call KSPSetOperators() here EACH time, otherwise  
the KSPSolve does not know it has a new matrix and will not build
a new preconditioner. Hence it continues to use the original  
preconditioner. This is why "answers are slightly different".
If the original preconditioner works ok for all timesteps then you do  
not need to call the KSPSetOperators()

   Barry


>
>
> call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr)
>
> I realise that the answers are slightly different as compared to  
> calling all the options such as KSPSetType, KSPSetFromOptions,  
> KSPSetTolerances at every time step. Should that be so? Is this the  
> best way?
>
> Also, I can let the matrix be equal at every time step by fixing the  
> delta_time. However, it may give stability problems. I wonder how  
> expensive is these type of value changing and assembly for a matrix?
>
> Thank you very much.
>
> Regards.
>


From Amit.Itagi at seagate.com  Fri May  9 12:46:59 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Fri, 9 May 2008 13:46:59 -0400
Subject: Code structuring - Communicator
Message-ID: <OFA9CDA834.12F94677-ON85257444.006069D4-85257444.0061AF09@seagate.com>


Hi,

I have a question about the Petsc communicator. I have a petsc program
"foo" which essentially runs in parallel and gives me y=f(x1,x2,...), where
y is an output parameter and xi's are input parameters. Suppose, I want to
run a parallel optimizer for the input parameters. I am looking at the
following functionality. I submit the optimizer job on 16 processors (using
"mpiexec -np 16 progName"). The optimizer should then submit 4 runs of
"foo", each running parallely on 4 processors. "foo" will be written as a
function and not as a main program in this case. How can I get this
functionality using Petsc ? Should PetscInitialize be called in the
optimizer, or in each foo run ? If PetscInitialize is called in the
optimizer, is there a way to make the foo function run only on a subset of
the 16 processors ?

May be, I haven't done a good job of explaining my problem. Let me know if
you need any clarifications.

Thanks

Rgds,
Amit


From bsmith at mcs.anl.gov  Fri May  9 14:07:10 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 May 2008 14:07:10 -0500
Subject: Code structuring - Communicator
In-Reply-To: <OFA9CDA834.12F94677-ON85257444.006069D4-85257444.0061AF09@seagate.com>
References: <OFA9CDA834.12F94677-ON85257444.006069D4-85257444.0061AF09@seagate.com>
Message-ID: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov>


    There are many ways to do this, most of them involve using MPI to  
construct subcommunicators
for the various sub parallel tasks. You very likely want to keep  
PetscInitialize() at
the very beginning of the program; you would not write the calls in  
terms of
PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the
subcommunicators to create the objects.

    An alternative approach is to look at the manual page for  
PetscOpenMPMerge(), PetscOpenMPRun(),
PetscOpenMPNew() in petsc-dev. These allow a simple master-worker  
model of parallelism
with PETSc with a bunch of masters that can work together (instead of  
just one master) and each
master controls a bunch of workers. The code in src/ksp/pc/impls/ 
openmp uses this code.

Note that OpenMP has NOTHING to do with OpenMP the standard. Also I  
don't really have
any support for Fortran, I hope you use C/C++. Comments welcome. It  
sounds like this matches
what you need. It's pretty cool,  but underdeveloped.

    Barry


On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote:

>
> Hi,
>
> I have a question about the Petsc communicator. I have a petsc program
> "foo" which essentially runs in parallel and gives me  
> y=f(x1,x2,...), where
> y is an output parameter and xi's are input parameters. Suppose, I  
> want to
> run a parallel optimizer for the input parameters. I am looking at the
> following functionality. I submit the optimizer job on 16 processors  
> (using
> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of
> "foo", each running parallely on 4 processors. "foo" will be written  
> as a
> function and not as a main program in this case. How can I get this
> functionality using Petsc ? Should PetscInitialize be called in the
> optimizer, or in each foo run ? If PetscInitialize is called in the
> optimizer, is there a way to make the foo function run only on a  
> subset of
> the 16 processors ?
>
> May be, I haven't done a good job of explaining my problem. Let me  
> know if
> you need any clarifications.
>
> Thanks
>
> Rgds,
> Amit
>


From zonexo at gmail.com  Sat May 10 03:38:30 2008
From: zonexo at gmail.com (Ben Tay)
Date: Sat, 10 May 2008 16:38:30 +0800
Subject: How to efficiently change just the diagonal vector in a matrix at every time step
In-Reply-To: <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com>
References: <48244491.2030303@gmail.com>
	 <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com>
Message-ID: <804ab5d40805100138t748676d7r3a074658ea73188a@mail.gmail.com>

Hi Sean,

Maybe for me, I can just insert vector diagonal D into the matrix A and call
Assembly and KSP at every time step. Should that be better since there is no
need to copy A into B?

Thanks!

On Fri, May 9, 2008 at 8:50 PM, Sean Dettrick <sdettrick at gmail.com> wrote:

>
> One way to do it is to have two Mats, A and B, and a Vec, D, to store the
> diagonal.  A is constructed only on the first step.  On subsequent steps, A
> is copied into B, and then D is added to the diagonal:
>
>  ierr = MatCopy( A, B, SAME_NON_ZERO_PATTERN );
>  ierr = MatDiagonalSet( B, D, ADD_VALUES );
>
> The KSP uses B as the matrix, not A.
>
> I don't know if this approach is efficient or not.  Can anybody comment?
>
> Thanks,
> Sean
>
>
>
>
> On May 9, 2008, at 2:33 PM, Ben Tay wrote:
>
> Hi,
>> I have a matrix and I inserted all the relevant values during the 1st
>> step. I'll then solve it. For the subsequent steps, I only need to change
>> the diagonal vector of the matrix before solving. I wonder how I can do it
>> efficiently. Of course, the RHS vector also change but I've not included
>> them here.
>>
>> I set these at the 1st step:
>>
>> call
>> KSPSetOperators(ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr)
>>
>> call KSPGetPC(ksp_semi_x,pc_semi_x,ierr)
>>
>>  ksptype=KSPRICHARDSON
>>
>>  call KSPSetType(ksp_semi_x,ksptype,ierr)
>>
>>  ptype = PCILU
>>
>>  call PCSetType(pc_semi_x,ptype,ierr)
>>
>>  call KSPSetFromOptions(ksp_semi_x,ierr)
>>
>>  call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr)
>>
>>  tol=1.e-5
>>
>>  call
>> KSPSetTolerances(ksp_semi_x,tol,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)
>>
>> and what I did at the subsequent steps is:
>>
>> do II=1,total
>>  call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr)
>>
>> end do
>>
>> call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>>
>> call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>>
>> call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr)
>>
>> I realise that the answers are slightly different as compared to calling
>> all the options such as KSPSetType, KSPSetFromOptions, KSPSetTolerances at
>> every time step. Should that be so? Is this the best way?
>>
>> Also, I can let the matrix be equal at every time step by fixing the
>> delta_time. However, it may give stability problems. I wonder how expensive
>> is these type of value changing and assembly for a matrix?
>>
>> Thank you very much.
>>
>> Regards.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080510/412c6374/attachment.htm>

From dalcinl at gmail.com  Sat May 10 16:17:49 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Sat, 10 May 2008 18:17:49 -0300
Subject: VecMultiVec?
In-Reply-To: <482385F7.3040902@cims.nyu.edu>
References: <482385F7.3040902@cims.nyu.edu>
Message-ID: <e7ba66e40805101417y1df32323je38ebd326feebf39@mail.gmail.com>

Depending on what you need to actually do with your multiple vectors,
PETSc do have a 'Vecs' (note de final 's') wich is just an array with
Vec items... However, as you are working with non-native data, perhaps
this will not be useful.. Take a look...

On 5/8/08, Boyce Griffith <griffith at cims.nyu.edu> wrote:
> Hi, Folks --
>
>  I'm pretty sure this has been discussed in the list previously, but I'm
> having trouble digging up the thread in the archive, so apologies in
> advance...
>
>  I need to solve some nonlinear equations that involve both PETSc Vecs as
> well as data that is stored in a non-PETSc-native format, and I was
> wondering if someone happens to have a freely available implementation of a
> "VecMultiVec" --- a vector which contains multiple Vec objects.  I think it
> should be fairly straightforward to implement such a beast, but I thought
> I'd ask before doing it myself.
>
>  Thanks,
>
>  -- Boyce
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From sdettrick at gmail.com  Mon May 12 05:42:47 2008
From: sdettrick at gmail.com (Sean Dettrick)
Date: Mon, 12 May 2008 12:42:47 +0200
Subject: How to efficiently change just the diagonal vector in a matrix at every time step
In-Reply-To: <804ab5d40805100138t748676d7r3a074658ea73188a@mail.gmail.com>
References: <48244491.2030303@gmail.com> <1F063728-F37F-425B-B137-4DF9C3D73921@gmail.com> <804ab5d40805100138t748676d7r3a074658ea73188a@mail.gmail.com>
Message-ID: <210E7E60-4FEB-41AD-9DE9-C9ACBB6AFB05@gmail.com>


On May 10, 2008, at 10:38 AM, Ben Tay wrote:

> Hi Sean,
>
> Maybe for me, I can just insert vector diagonal D into the matrix A  
> and call Assembly and KSP at every time step. Should that be better  
> since there is no need to copy A into B?
>
> Thanks!


Yes, that sounds like it would be faster.  I suppose you would use  
INSERT_VALUES rather than ADD_VALUES.

In my case I copy the whole matrix because constructing the time- 
constant part of the diagonal is very complicated.   But now that you  
mention it, I could store both the time-constant and time-varying  
diagonal components in two separate Vecs, and only have one Mat, and  
then do MatDiagonalSet() twice at each timestep - the first time with  
INSERT_VALUES, the second with ADD_VALUES.  That sounds like it would  
be faster.

Thanks to you too,

Sean


>
>
> On Fri, May 9, 2008 at 8:50 PM, Sean Dettrick <sdettrick at gmail.com>  
> wrote:
>
> One way to do it is to have two Mats, A and B, and a Vec, D, to  
> store the diagonal.  A is constructed only on the first step.  On  
> subsequent steps, A is copied into B, and then D is added to the  
> diagonal:
>
>  ierr = MatCopy( A, B, SAME_NON_ZERO_PATTERN );
>  ierr = MatDiagonalSet( B, D, ADD_VALUES );
>
> The KSP uses B as the matrix, not A.
>
> I don't know if this approach is efficient or not.  Can anybody  
> comment?
>
> Thanks,
> Sean
>
>
>
>
> On May 9, 2008, at 2:33 PM, Ben Tay wrote:
>
> Hi,
> I have a matrix and I inserted all the relevant values during the  
> 1st step. I'll then solve it. For the subsequent steps, I only need  
> to change the diagonal vector of the matrix before solving. I wonder  
> how I can do it efficiently. Of course, the RHS vector also change  
> but I've not included them here.
>
> I set these at the 1st step:
>
> call  
> KSPSetOperators 
> (ksp_semi_x,A_semi_x,A_semi_x,SAME_NONZERO_PATTERN,ierr)
>
> call KSPGetPC(ksp_semi_x,pc_semi_x,ierr)
>
>  ksptype=KSPRICHARDSON
>
>  call KSPSetType(ksp_semi_x,ksptype,ierr)
>
>  ptype = PCILU
>
>  call PCSetType(pc_semi_x,ptype,ierr)
>
>  call KSPSetFromOptions(ksp_semi_x,ierr)
>
>  call KSPSetInitialGuessNonzero(ksp_semi_x,PETSC_TRUE,ierr)
>
>  tol=1.e-5
>
>  call  
> KSPSetTolerances 
> (ksp_semi_x 
> ,tol 
> ,PETSC_DEFAULT_DOUBLE_PRECISION 
> ,PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)
>
> and what I did at the subsequent steps is:
>
> do II=1,total
>  call MatSetValues(A_semi_x,1,II,1,II,new_value,INSERT_VALUES,ierr)
>
> end do
>
> call MatAssemblyBegin(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>
> call MatAssemblyEnd(A_semi_x,MAT_FINAL_ASSEMBLY,ierr)
>
> call KSPSolve(ksp_semi_x,b_rhs_semi_x,xx_semi_x,ierr)
>
> I realise that the answers are slightly different as compared to  
> calling all the options such as KSPSetType, KSPSetFromOptions,  
> KSPSetTolerances at every time step. Should that be so? Is this the  
> best way?
>
> Also, I can let the matrix be equal at every time step by fixing the  
> delta_time. However, it may give stability problems. I wonder how  
> expensive is these type of value changing and assembly for a matrix?
>
> Thank you very much.
>
> Regards.
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/13c5d22e/attachment.htm>

From Amit.Itagi at seagate.com  Mon May 12 08:27:50 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Mon, 12 May 2008 09:27:50 -0400
Subject: Code structuring - Communicator
In-Reply-To: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov>
Message-ID: <OFC728F1D3.F3B72C17-ON85257447.0049F121-85257447.0049F5E4@seagate.com>

Thanks, Barry.

Rgds,
Amit


             Barry Smith                                                   
             <bsmith at mcs.anl.g                                             
             ov>                                                        To 
             Sent by:                  petsc-users at mcs.anl.gov             
             owner-petsc-users                                          cc 
             @mcs.anl.gov                                                  
             No Phone Info                                         Subject 
             Available                 Re: Code structuring - Communicator 
                                                                           
                                                                           
             05/09/2008 03:07                                              
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             petsc-users at mcs.a                                             
                  nl.gov                                                   
                                                                           
                                                                           
    There are many ways to do this, most of them involve using MPI to
construct subcommunicators
for the various sub parallel tasks. You very likely want to keep
PetscInitialize() at
the very beginning of the program; you would not write the calls in
terms of
PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the
subcommunicators to create the objects.

    An alternative approach is to look at the manual page for
PetscOpenMPMerge(), PetscOpenMPRun(),
PetscOpenMPNew() in petsc-dev. These allow a simple master-worker
model of parallelism
with PETSc with a bunch of masters that can work together (instead of
just one master) and each
master controls a bunch of workers. The code in src/ksp/pc/impls/
openmp uses this code.

Note that OpenMP has NOTHING to do with OpenMP the standard. Also I
don't really have
any support for Fortran, I hope you use C/C++. Comments welcome. It
sounds like this matches
what you need. It's pretty cool,  but underdeveloped.

    Barry


On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote:

>
> Hi,
>
> I have a question about the Petsc communicator. I have a petsc program
> "foo" which essentially runs in parallel and gives me
> y=f(x1,x2,...), where
> y is an output parameter and xi's are input parameters. Suppose, I
> want to
> run a parallel optimizer for the input parameters. I am looking at the
> following functionality. I submit the optimizer job on 16 processors
> (using
> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of
> "foo", each running parallely on 4 processors. "foo" will be written
> as a
> function and not as a main program in this case. How can I get this
> functionality using Petsc ? Should PetscInitialize be called in the
> optimizer, or in each foo run ? If PetscInitialize is called in the
> optimizer, is there a way to make the foo function run only on a
> subset of
> the 16 processors ?
>
> May be, I haven't done a good job of explaining my problem. Let me
> know if
> you need any clarifications.
>
> Thanks
>
> Rgds,
> Amit
>


From mfatenejad at wisc.edu  Mon May 12 13:10:29 2008
From: mfatenejad at wisc.edu (Milad Fatenejad)
Date: Mon, 12 May 2008 13:10:29 -0500
Subject: 2 Questions about DAs
Message-ID: <f489d6e30805121110r231a1442weaeed2d3764b0a83@mail.gmail.com>

Hello:
First, I'm having some email problems, so sorry if this shows up twice...

I am using PETSc to write a large multi-physics finite difference code
with a lot of opportunity for overlapping computation and
communication. Right now, I have created ~100 petsc vectors for
storing various quantities, which currently all share a single DA. The
problem with this system is that I can only scatter one quantity at a
time to update the values of the ghost points. If I try to scatter
more than one object at a time, I get the following error:

[0]PETSC ERROR: Object is in wrong state!
[0]PETSC ERROR:  Scatter ctx already in use!

It would be really nice to be able to start scattering a vector
whenever I am done with a computation, and just finish the scatter
whenever I need the vector again. Again, this is impossible because
all of the vectors share the same DA.

I then reorganized my code, so that each vector had its own DA,
however, this led to the program running significantly more slowly (I
assume this is just because I have so many vectors).

So my first question is: Is there a way to organize the code so I can
overlap the scattering of vectors without having a significant
performance hit?


And on a related note, many times I need to create arrays of vectors.
I just discovered the function "VecDuplicateVecs" (and related
functions),  which look like performs this operation. Is this the best
way to create arrays of vectors? Is there a way to directly get the
array from the DA without having to create a vector and duplicate it
(I don't see a "DACreateGlobalVectorS")?

I know it is also possible to do something like this using the DOF
parameter in the DACreate call as shown in:
http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html

Are there any advantages to using dof as opposed to VecDuplicateVecs, etc.?

I'd appreciate any help

Thank You
Milad Fatenejad


From mfatenejad at wisc.edu  Mon May 12 11:02:33 2008
From: mfatenejad at wisc.edu (Milad Fatenejad)
Date: Mon, 12 May 2008 11:02:33 -0500
Subject: 2 Questions about DAs
Message-ID: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>

Hello:
I have two separate DA questions:

1) I am writing a large finite difference code and would like to be
able to represent an array of vectors. I am currently doing this by
creating a single DA and calling DACreateGlobalVector several times,
but the manual also states that:

"PETSc currently provides no container for multiple arrays sharing the
same distributed array communication; note, however, that the dof
parameter handles many cases of interest."

I also found the following mailing list thread which describes how to
use the dof parameter to represent several vectors:

http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html

Where the following solution is proposed:
"""
The easiest thing to do in C is to declare a struct:

typedef struct {
  PetscScalar v[3];
  PetscScalar p;
} Space;

and then cast pointers

  Space ***array;

  DAVecGetArray(da, u, (void *) &array);

     array[k][j][i].v *= -1.0;
"""

The problem with the proposed solution, is that they use a struct to
get the individual values, but what if you don't know the number of
degrees of freedom at compile time?

So my question is two fold:
a) Is there a problem with just having a single DA and calling
DACreateGlobalVector multiple times? Does this affect performance at
all (I have many different vectors)?
b) Is there a way to use the dof parameter when creating a DA when the
number of degrees of freedom is not known at compile time?
Specifically, I would like to be able to access the individual values
of the vector, just like the example shows...


2) The code I am writing has a lot of different parts which present a
lot of opportunities to overlap communication an computation when
scattering vectors to update values in the ghost points. Right now,
all of my vectors (there are ~50 of them) share a single DA because
they all have the same shape. However, by sharing a single DA, I can
only scatter one vector at a time. It would be nice to be able to
start scattering each vector right after I'm done computing it, and
finish scattering it right before I need it again but I can't because
other vectors might need to be scattered in between. I then re-wrote
part of my code so that each vector had its own DA object, but this
ended up being incredibly slow (I assume this is because I have so
many vectors).

My question is, is there a way to scatter multiple vectors
simultaneously without affecting the performance of the code? Does it
make sense to do this?


I'd really appreciate any help...

Thanks
Milad Fatenejad


From icksa1 at gmail.com  Mon May 12 13:41:02 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Mon, 12 May 2008 13:41:02 -0500
Subject: 2 Questions about DAs
Message-ID: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com>

Hello:
First, I'm having some email problems, so sorry if this shows up a few times...

I am using PETSc to write a large multi-physics finite difference code
with a lot of opportunity for overlapping computation and
communication. Right now, I have created ~100 petsc vectors for
storing various quantities, which currently all share a single DA. The
problem with this system is that I can only scatter one quantity at a
time to update the values of the ghost points. If I try to scatter
more than one object at a time, I get the following error:

[0]PETSC ERROR: Object is in wrong state!
[0]PETSC ERROR:  Scatter ctx already in use!

It would be really nice to be able to start scattering a vector
whenever I am done with a computation, and just finish the scatter
whenever I need the vector again. Again, this is impossible because
all of the vectors share the same DA.

I then reorganized my code, so that each vector had its own DA,
however, this led to the program running significantly more slowly (I
assume this is just because I have so many vectors).

So my first question is: Is there a way to organize the code so I can
overlap the scattering of vectors without having a significant
performance hit?


And on a related note, many times I need to create arrays of vectors.
I just discovered the function "VecDuplicateVecs" (and related
functions),  which look like performs this operation. Is this the best
way to create arrays of vectors? Is there a way to directly get the
array from the DA without having to create a vector and duplicate it
(I don't see a "DACreateGlobalVectorS")?

I know it is also possible to do something like this using the DOF
parameter in the DACreate call as shown in:
http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html

Are there any advantages to using dof as opposed to VecDuplicateVecs, etc.?

I'd appreciate any help

Thank You
Milad Fatenejad


From Amit.Itagi at seagate.com  Mon May 12 13:43:50 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Mon, 12 May 2008 14:43:50 -0400
Subject: 2 Questions about DAs
In-Reply-To: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
Message-ID: <OF9A694BC3.2DAB71B8-ON85257447.0066B783-85257447.0066E42E@seagate.com>

Milad,

I have a DA with 6 vectors sharing the DA structure. I defined the DA to
have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter
the vectors separately. Things seem to work fine.

Thanks

Rgds,
Amit


             "Milad Fatenejad"                                             
             <mfatenejad at wisc.                                             
             edu>                                                       To 
             Sent by:                  petsc-users at mcs.anl.gov             
             owner-petsc-users                                          cc 
             @mcs.anl.gov                                                  
             No Phone Info                                         Subject 
             Available                 2 Questions about DAs               
                                                                           
                                                                           
             05/12/2008 12:02                                              
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             petsc-users at mcs.a                                             
                  nl.gov                                                   
                                                                           
                                                                           
Hello:
I have two separate DA questions:

1) I am writing a large finite difference code and would like to be
able to represent an array of vectors. I am currently doing this by
creating a single DA and calling DACreateGlobalVector several times,
but the manual also states that:

"PETSc currently provides no container for multiple arrays sharing the
same distributed array communication; note, however, that the dof
parameter handles many cases of interest."

I also found the following mailing list thread which describes how to
use the dof parameter to represent several vectors:

http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html


Where the following solution is proposed:
"""
The easiest thing to do in C is to declare a struct:

typedef struct {
  PetscScalar v[3];
  PetscScalar p;
} Space;

and then cast pointers

  Space ***array;

  DAVecGetArray(da, u, (void *) &array);

     array[k][j][i].v *= -1.0;
"""

The problem with the proposed solution, is that they use a struct to
get the individual values, but what if you don't know the number of
degrees of freedom at compile time?

So my question is two fold:
a) Is there a problem with just having a single DA and calling
DACreateGlobalVector multiple times? Does this affect performance at
all (I have many different vectors)?
b) Is there a way to use the dof parameter when creating a DA when the
number of degrees of freedom is not known at compile time?
Specifically, I would like to be able to access the individual values
of the vector, just like the example shows...


2) The code I am writing has a lot of different parts which present a
lot of opportunities to overlap communication an computation when
scattering vectors to update values in the ghost points. Right now,
all of my vectors (there are ~50 of them) share a single DA because
they all have the same shape. However, by sharing a single DA, I can
only scatter one vector at a time. It would be nice to be able to
start scattering each vector right after I'm done computing it, and
finish scattering it right before I need it again but I can't because
other vectors might need to be scattered in between. I then re-wrote
part of my code so that each vector had its own DA object, but this
ended up being incredibly slow (I assume this is because I have so
many vectors).

My question is, is there a way to scatter multiple vectors
simultaneously without affecting the performance of the code? Does it
make sense to do this?


I'd really appreciate any help...

Thanks
Milad Fatenejad


From knepley at gmail.com  Mon May 12 13:56:51 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 12 May 2008 13:56:51 -0500
Subject: 2 Questions about DAs
In-Reply-To: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
Message-ID: <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>

On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad <mfatenejad at wisc.edu> wrote:
> Hello:
>  I have two separate DA questions:
>
>  1) I am writing a large finite difference code and would like to be
>  able to represent an array of vectors. I am currently doing this by
>  creating a single DA and calling DACreateGlobalVector several times,
>  but the manual also states that:
>
>  "PETSc currently provides no container for multiple arrays sharing the
>  same distributed array communication; note, however, that the dof
>  parameter handles many cases of interest."
>
>  I also found the following mailing list thread which describes how to
>  use the dof parameter to represent several vectors:
>
>
>  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>
>  Where the following solution is proposed:
>  """
>  The easiest thing to do in C is to declare a struct:
>
>  typedef struct {
>   PetscScalar v[3];
>   PetscScalar p;
>  } Space;
>
>  and then cast pointers
>
>   Space ***array;
>
>   DAVecGetArray(da, u, (void *) &array);
>
>      array[k][j][i].v *= -1.0;
>  """
>
>  The problem with the proposed solution, is that they use a struct to
>  get the individual values, but what if you don't know the number of
>  degrees of freedom at compile time?

It would be nice to get variable structs in C. However, you can just deference
the object directly. For example, for 50 degrees of freedom, you can do

   array[k][j][i][47] *= -1.0;

>  So my question is two fold:
>  a) Is there a problem with just having a single DA and calling
>  DACreateGlobalVector multiple times? Does this affect performance at
>  all (I have many different vectors)?

These are all independent objects. Thus, by itself, creating any number of
Vecs does nothing to performance (unless you start to run out of memory).

>  b) Is there a way to use the dof parameter when creating a DA when the
>  number of degrees of freedom is not known at compile time?
>  Specifically, I would like to be able to access the individual values
>  of the vector, just like the example shows...

see above.

>  2) The code I am writing has a lot of different parts which present a
>  lot of opportunities to overlap communication an computation when
>  scattering vectors to update values in the ghost points. Right now,
>  all of my vectors (there are ~50 of them) share a single DA because
>  they all have the same shape. However, by sharing a single DA, I can
>  only scatter one vector at a time. It would be nice to be able to
>  start scattering each vector right after I'm done computing it, and
>  finish scattering it right before I need it again but I can't because
>  other vectors might need to be scattered in between. I then re-wrote
>  part of my code so that each vector had its own DA object, but this
>  ended up being incredibly slow (I assume this is because I have so
>  many vectors).

The problem here is that buffering will have to be done for each outstanding
scatter. Thus I see two resolutions:

  1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once.
      This is essentially what you accomplish with separate DAs.

  2) You the dof method. However, this scatter ALL the vectors every time.

I do not understand what performance problem you would have with multiple
DAs. With any performance questions, we suggest sending the output of
-log_summary so we have data to look at.

  Matt

>  My question is, is there a way to scatter multiple vectors
>  simultaneously without affecting the performance of the code? Does it
>  make sense to do this?
>
>
>  I'd really appreciate any help...
>
>  Thanks
>  Milad Fatenejad
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From icksa1 at gmail.com  Mon May 12 13:58:35 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Mon, 12 May 2008 13:58:35 -0500
Subject: 2 Questions about DAs
In-Reply-To: <OF9A694BC3.2DAB71B8-ON85257447.0066B783-85257447.0066E42E@seagate.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
	 <OF9A694BC3.2DAB71B8-ON85257447.0066B783-85257447.0066E42E@seagate.com>
Message-ID: <f489d6e30805121158h53faf225sd0ab16112e7c2302@mail.gmail.com>

Hi,
Thanks for the reply.

If I have two global vectors from the same DA, say global1 and global2
and I try to scatter to local vectors local1 and local2 using the
command DAGlobalToLocalBegin/End in the following manner:

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);

DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

Everything is fine. If instead I do:

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);

DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

I get the error:
[0]PETSC ERROR: Object is in wrong state!
[0]PETSC ERROR:  Scatter ctx already in use!

What I would like to be able to do is overlap the scattering and that
produces the error.

Thanks
Milad

On Mon, May 12, 2008 at 1:43 PM,  <Amit.Itagi at seagate.com> wrote:
> Milad,
>
>  I have a DA with 6 vectors sharing the DA structure. I defined the DA to
>  have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter
>  the vectors separately. Things seem to work fine.
>
>  Thanks
>
>  Rgds,
>  Amit
>
>
>
>
>              "Milad Fatenejad"
>              <mfatenejad at wisc.
>              edu>                                                       To
>              Sent by:                  petsc-users at mcs.anl.gov
>              owner-petsc-users                                          cc
>              @mcs.anl.gov
>              No Phone Info                                         Subject
>              Available                 2 Questions about DAs
>
>
>              05/12/2008 12:02
>              PM
>
>
>              Please respond to
>              petsc-users at mcs.a
>                   nl.gov
>
>
>
>
>
>
>
>
>  Hello:
>  I have two separate DA questions:
>
>  1) I am writing a large finite difference code and would like to be
>  able to represent an array of vectors. I am currently doing this by
>  creating a single DA and calling DACreateGlobalVector several times,
>  but the manual also states that:
>
>  "PETSc currently provides no container for multiple arrays sharing the
>  same distributed array communication; note, however, that the dof
>  parameter handles many cases of interest."
>
>  I also found the following mailing list thread which describes how to
>  use the dof parameter to represent several vectors:
>
>  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>
>
>  Where the following solution is proposed:
>  """
>  The easiest thing to do in C is to declare a struct:
>
>  typedef struct {
>   PetscScalar v[3];
>   PetscScalar p;
>  } Space;
>
>  and then cast pointers
>
>   Space ***array;
>
>   DAVecGetArray(da, u, (void *) &array);
>
>      array[k][j][i].v *= -1.0;
>  """
>
>  The problem with the proposed solution, is that they use a struct to
>  get the individual values, but what if you don't know the number of
>  degrees of freedom at compile time?
>
>  So my question is two fold:
>  a) Is there a problem with just having a single DA and calling
>  DACreateGlobalVector multiple times? Does this affect performance at
>  all (I have many different vectors)?
>  b) Is there a way to use the dof parameter when creating a DA when the
>  number of degrees of freedom is not known at compile time?
>  Specifically, I would like to be able to access the individual values
>  of the vector, just like the example shows...
>
>
>  2) The code I am writing has a lot of different parts which present a
>  lot of opportunities to overlap communication an computation when
>  scattering vectors to update values in the ghost points. Right now,
>  all of my vectors (there are ~50 of them) share a single DA because
>  they all have the same shape. However, by sharing a single DA, I can
>  only scatter one vector at a time. It would be nice to be able to
>  start scattering each vector right after I'm done computing it, and
>  finish scattering it right before I need it again but I can't because
>  other vectors might need to be scattered in between. I then re-wrote
>  part of my code so that each vector had its own DA object, but this
>  ended up being incredibly slow (I assume this is because I have so
>  many vectors).
>
>  My question is, is there a way to scatter multiple vectors
>  simultaneously without affecting the performance of the code? Does it
>  make sense to do this?
>
>
>  I'd really appreciate any help...
>
>  Thanks
>  Milad Fatenejad
>
>
>
>


From tsjb00 at hotmail.com  Mon May 12 13:59:49 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Mon, 12 May 2008 18:59:49 +0000
Subject: Q. of multi-componet system and data from input file
In-Reply-To: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com>
References: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com>
Message-ID: <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl>


Hi, there! I am a beginner of PETSc and I have some questions about using PETSc to solve for a multi-componet system. The code is supposed to be applicable to different systems, where number of components, properties of components ,etc. would be input for the program. 

Say I define DA with dof=number of components = nc, number of grid in x,y,z = nx,ny,nz respectively. When I use DA related functions, it seems that by default the data objects (vectors, arrays, etc.) would be of nx*ny*nz*nc. However, some physical variables are independent of specific components, which means I need to handle data objects of nx*ny*nz*integral. My questions are:

Does PETSc include tools or examples to deal with such problems?

If not, how can I make sure the 'nx*ny*nz*any integral' data objects are distributed over the nodes in a way defined by DA? I am using PETSc_Decide for partitioning right now. I would prefer that at least the number of processors be flexible.

I need to read in a property f(x,y,z) from a data file and then distribute the data across different processors. Any suggestions on this would be appreciated. My concern is that if I use MPI_Send/Receive, the data to be transferred might correspond to discontinuous indices due to the partitioning.

Many thanks in advance!

BJ

> 

_________________________________________________________________
MSN ????????????????????
http://cn.msn.com


From knepley at gmail.com  Mon May 12 14:33:27 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 12 May 2008 14:33:27 -0500
Subject: Q. of multi-componet system and data from input file
In-Reply-To: <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl>
References: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com>
	 <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl>
Message-ID: <a9f269830805121233t695f431bx37aa3c66d6634df0@mail.gmail.com>

2008/5/12 tsjb00 <tsjb00 at hotmail.com>:
>
>  Hi, there! I am a beginner of PETSc and I have some questions about using PETSc to solve for a multi-componet system. The code is supposed to be applicable to different systems, where number of components, properties of components ,etc. would be input for the program.
>
>  Say I define DA with dof=number of components = nc, number of grid in x,y,z = nx,ny,nz respectively. When I use DA related functions, it seems that by default the data objects (vectors, arrays, etc.) would be of nx*ny*nz*nc. However, some physical variables are independent of specific components, which means I need to handle data objects of nx*ny*nz*integral. My questions are:
>
>  Does PETSc include tools or examples to deal with such problems?

Make a new DA for those vectors. DA are extremely small since they
store O(1) data.

>  If not, how can I make sure the 'nx*ny*nz*any integral' data objects are distributed over the nodes in a way defined by DA? I am using PETSc_Decide for partitioning right now. I would prefer that at least the number of processors be flexible.
>
>  I need to read in a property f(x,y,z) from a data file and then distribute the data across different processors. Any suggestions on this would be appreciated. My concern is that if I use MPI_Send/Receive, the data to be transferred might correspond to discontinuous indices due to the partitioning.

If you store that data in PETSc Vec format, you can just use VecLoad()
and we will distribute everything for you. A
simple way to do this, is to read it in on 1 process, put it in a Vec,
and VecView(). Then you can read it back in on
multiple processes after that.

  Matt

>  Many thanks in advance!
>
>  BJ
>
>  >
>
>
>  _________________________________________________________________
>  MSN ????????????????????
>  http://cn.msn.com
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From Amit.Itagi at seagate.com  Mon May 12 14:51:38 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Mon, 12 May 2008 15:51:38 -0400
Subject: 2 Questions about DAs
In-Reply-To: <f489d6e30805121158h53faf225sd0ab16112e7c2302@mail.gmail.com>
Message-ID: <OF5D749A0D.D291E6BE-ON85257447.006CB102-85257447.006D198D@seagate.com>

I don't understand your motivation for trying to have two consecutive
scatterBegin before the scatterEnd of the first. I think that the
scatterBegin and the corresponding scatterEnd need to thought of as a
single scatter operation. Infact, I don't understand the concept of
"overlapping" the scattering.

Thanks

Rgds,
Amit


             "Milad Fatenejad"                                             
             <icksa1 at gmail.com                                             
             >                                                          To 
             Sent by:                  petsc-users at mcs.anl.gov             
             owner-petsc-users                                          cc 
             @mcs.anl.gov                                                  
             No Phone Info                                         Subject 
             Available                 Re: 2 Questions about DAs           
                                                                           
                                                                           
             05/12/2008 02:58                                              
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             petsc-users at mcs.a                                             
                  nl.gov                                                   
                                                                           
                                                                           
Hi,
Thanks for the reply.

If I have two global vectors from the same DA, say global1 and global2
and I try to scatter to local vectors local1 and local2 using the
command DAGlobalToLocalBegin/End in the following manner:

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);

DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

Everything is fine. If instead I do:

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);

DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

I get the error:
[0]PETSC ERROR: Object is in wrong state!
[0]PETSC ERROR:  Scatter ctx already in use!

What I would like to be able to do is overlap the scattering and that
produces the error.

Thanks
Milad

On Mon, May 12, 2008 at 1:43 PM,  <Amit.Itagi at seagate.com> wrote:
> Milad,
>
>  I have a DA with 6 vectors sharing the DA structure. I defined the DA to
>  have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter
>  the vectors separately. Things seem to work fine.
>
>  Thanks
>
>  Rgds,
>  Amit
>
>
>
>
>              "Milad Fatenejad"
>              <mfatenejad at wisc.
>              edu>
To
>              Sent by:                  petsc-users at mcs.anl.gov
>              owner-petsc-users
cc
>              @mcs.anl.gov
>              No Phone Info
Subject
>              Available                 2 Questions about DAs
>
>
>              05/12/2008 12:02
>              PM
>
>
>              Please respond to
>              petsc-users at mcs.a
>                   nl.gov
>
>
>
>
>
>
>
>
>  Hello:
>  I have two separate DA questions:
>
>  1) I am writing a large finite difference code and would like to be
>  able to represent an array of vectors. I am currently doing this by
>  creating a single DA and calling DACreateGlobalVector several times,
>  but the manual also states that:
>
>  "PETSc currently provides no container for multiple arrays sharing the
>  same distributed array communication; note, however, that the dof
>  parameter handles many cases of interest."
>
>  I also found the following mailing list thread which describes how to
>  use the dof parameter to represent several vectors:
>
>
http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html

>
>
>  Where the following solution is proposed:
>  """
>  The easiest thing to do in C is to declare a struct:
>
>  typedef struct {
>   PetscScalar v[3];
>   PetscScalar p;
>  } Space;
>
>  and then cast pointers
>
>   Space ***array;
>
>   DAVecGetArray(da, u, (void *) &array);
>
>      array[k][j][i].v *= -1.0;
>  """
>
>  The problem with the proposed solution, is that they use a struct to
>  get the individual values, but what if you don't know the number of
>  degrees of freedom at compile time?
>
>  So my question is two fold:
>  a) Is there a problem with just having a single DA and calling
>  DACreateGlobalVector multiple times? Does this affect performance at
>  all (I have many different vectors)?
>  b) Is there a way to use the dof parameter when creating a DA when the
>  number of degrees of freedom is not known at compile time?
>  Specifically, I would like to be able to access the individual values
>  of the vector, just like the example shows...
>
>
>  2) The code I am writing has a lot of different parts which present a
>  lot of opportunities to overlap communication an computation when
>  scattering vectors to update values in the ghost points. Right now,
>  all of my vectors (there are ~50 of them) share a single DA because
>  they all have the same shape. However, by sharing a single DA, I can
>  only scatter one vector at a time. It would be nice to be able to
>  start scattering each vector right after I'm done computing it, and
>  finish scattering it right before I need it again but I can't because
>  other vectors might need to be scattered in between. I then re-wrote
>  part of my code so that each vector had its own DA object, but this
>  ended up being incredibly slow (I assume this is because I have so
>  many vectors).
>
>  My question is, is there a way to scatter multiple vectors
>  simultaneously without affecting the performance of the code? Does it
>  make sense to do this?
>
>
>  I'd really appreciate any help...
>
>  Thanks
>  Milad Fatenejad
>
>
>
>


From icksa1 at gmail.com  Mon May 12 15:01:49 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Mon, 12 May 2008 15:01:49 -0500
Subject: 2 Questions about DAs
In-Reply-To: <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
	 <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>
Message-ID: <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com>

Hello:
I've attached the result of two calculations. The file "log-multi-da"
uses 1 DA for each vector (322 in all) and the file "log-single-da"
using 1 DA for the entire calculation. When using 322 DA's, about 10x
more time is spent in VecScatterBegin and VecScatterEnd. Both were
running using two processes

I should mention that the source code for these two runs was exactly
the same, I didn't reorder the scatters differently. The only
difference was the number of DAs

Any suggestions? Do you think this is related to the number of DA's,
or something else?

Thanks for your help
Milad

On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad <mfatenejad at wisc.edu> wrote:
>  > Hello:
>  >  I have two separate DA questions:
>  >
>  >  1) I am writing a large finite difference code and would like to be
>  >  able to represent an array of vectors. I am currently doing this by
>  >  creating a single DA and calling DACreateGlobalVector several times,
>  >  but the manual also states that:
>  >
>  >  "PETSc currently provides no container for multiple arrays sharing the
>  >  same distributed array communication; note, however, that the dof
>  >  parameter handles many cases of interest."
>  >
>  >  I also found the following mailing list thread which describes how to
>  >  use the dof parameter to represent several vectors:
>  >
>  >
>  >  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>  >
>  >  Where the following solution is proposed:
>  >  """
>  >  The easiest thing to do in C is to declare a struct:
>  >
>  >  typedef struct {
>  >   PetscScalar v[3];
>  >   PetscScalar p;
>  >  } Space;
>  >
>  >  and then cast pointers
>  >
>  >   Space ***array;
>  >
>  >   DAVecGetArray(da, u, (void *) &array);
>  >
>  >      array[k][j][i].v *= -1.0;
>  >  """
>  >
>  >  The problem with the proposed solution, is that they use a struct to
>  >  get the individual values, but what if you don't know the number of
>  >  degrees of freedom at compile time?
>
>  It would be nice to get variable structs in C. However, you can just deference
>  the object directly. For example, for 50 degrees of freedom, you can do
>
>    array[k][j][i][47] *= -1.0;
>
>
>  >  So my question is two fold:
>  >  a) Is there a problem with just having a single DA and calling
>  >  DACreateGlobalVector multiple times? Does this affect performance at
>  >  all (I have many different vectors)?
>
>  These are all independent objects. Thus, by itself, creating any number of
>  Vecs does nothing to performance (unless you start to run out of memory).
>
>
>  >  b) Is there a way to use the dof parameter when creating a DA when the
>  >  number of degrees of freedom is not known at compile time?
>  >  Specifically, I would like to be able to access the individual values
>  >  of the vector, just like the example shows...
>
>
> see above.
>
>  >  2) The code I am writing has a lot of different parts which present a
>  >  lot of opportunities to overlap communication an computation when
>  >  scattering vectors to update values in the ghost points. Right now,
>  >  all of my vectors (there are ~50 of them) share a single DA because
>  >  they all have the same shape. However, by sharing a single DA, I can
>  >  only scatter one vector at a time. It would be nice to be able to
>  >  start scattering each vector right after I'm done computing it, and
>  >  finish scattering it right before I need it again but I can't because
>  >  other vectors might need to be scattered in between. I then re-wrote
>  >  part of my code so that each vector had its own DA object, but this
>  >  ended up being incredibly slow (I assume this is because I have so
>  >  many vectors).
>
>  The problem here is that buffering will have to be done for each outstanding
>  scatter. Thus I see two resolutions:
>
>   1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once.
>       This is essentially what you accomplish with separate DAs.
>
>   2) You the dof method. However, this scatter ALL the vectors every time.
>
>  I do not understand what performance problem you would have with multiple
>  DAs. With any performance questions, we suggest sending the output of
>  -log_summary so we have data to look at.
>
>   Matt
>
>
>
>  >  My question is, is there a way to scatter multiple vectors
>  >  simultaneously without affecting the performance of the code? Does it
>  >  make sense to do this?
>  >
>  >
>  >  I'd really appreciate any help...
>  >
>  >  Thanks
>  >  Milad Fatenejad
>  >
>  >
>
>
>
>  --
>  What most experimenters take for granted before they begin their
>  experiments is infinitely more interesting than any results to which
>  their experiments lead.
>  -- Norbert Wiener
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log-multi-da
Type: application/octet-stream
Size: 11372 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/b45b0f2e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log-single-da
Type: application/octet-stream
Size: 11372 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/b45b0f2e/attachment-0001.obj>

From knepley at gmail.com  Mon May 12 15:15:45 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 12 May 2008 15:15:45 -0500
Subject: 2 Questions about DAs
In-Reply-To: <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
	 <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>
	 <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com>
Message-ID: <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com>

On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad <icksa1 at gmail.com> wrote:
> Hello:
>  I've attached the result of two calculations. The file "log-multi-da"
>  uses 1 DA for each vector (322 in all) and the file "log-single-da"
>  using 1 DA for the entire calculation. When using 322 DA's, about 10x
>  more time is spent in VecScatterBegin and VecScatterEnd. Both were
>  running using two processes
>
>  I should mention that the source code for these two runs was exactly
>  the same, I didn't reorder the scatters differently. The only
>  difference was the number of DAs
>
>  Any suggestions? Do you think this is related to the number of DA's,
>  or something else?

There are vastly different numbers of reductions and much bigger memory usage.
Please send the code and I will look at it.

  Matt

>  Thanks for your help
>  Milad
>
>  On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley <knepley at gmail.com> wrote:
>  >
>  > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad <mfatenejad at wisc.edu> wrote:
>  >  > Hello:
>  >  >  I have two separate DA questions:
>  >  >
>  >  >  1) I am writing a large finite difference code and would like to be
>  >  >  able to represent an array of vectors. I am currently doing this by
>  >  >  creating a single DA and calling DACreateGlobalVector several times,
>  >  >  but the manual also states that:
>  >  >
>  >  >  "PETSc currently provides no container for multiple arrays sharing the
>  >  >  same distributed array communication; note, however, that the dof
>  >  >  parameter handles many cases of interest."
>  >  >
>  >  >  I also found the following mailing list thread which describes how to
>  >  >  use the dof parameter to represent several vectors:
>  >  >
>  >  >
>  >  >  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>  >  >
>  >  >  Where the following solution is proposed:
>  >  >  """
>  >  >  The easiest thing to do in C is to declare a struct:
>  >  >
>  >  >  typedef struct {
>  >  >   PetscScalar v[3];
>  >  >   PetscScalar p;
>  >  >  } Space;
>  >  >
>  >  >  and then cast pointers
>  >  >
>  >  >   Space ***array;
>  >  >
>  >  >   DAVecGetArray(da, u, (void *) &array);
>  >  >
>  >  >      array[k][j][i].v *= -1.0;
>  >  >  """
>  >  >
>  >  >  The problem with the proposed solution, is that they use a struct to
>  >  >  get the individual values, but what if you don't know the number of
>  >  >  degrees of freedom at compile time?
>  >
>  >  It would be nice to get variable structs in C. However, you can just deference
>  >  the object directly. For example, for 50 degrees of freedom, you can do
>  >
>  >    array[k][j][i][47] *= -1.0;
>  >
>  >
>  >  >  So my question is two fold:
>  >  >  a) Is there a problem with just having a single DA and calling
>  >  >  DACreateGlobalVector multiple times? Does this affect performance at
>  >  >  all (I have many different vectors)?
>  >
>  >  These are all independent objects. Thus, by itself, creating any number of
>  >  Vecs does nothing to performance (unless you start to run out of memory).
>  >
>  >
>  >  >  b) Is there a way to use the dof parameter when creating a DA when the
>  >  >  number of degrees of freedom is not known at compile time?
>  >  >  Specifically, I would like to be able to access the individual values
>  >  >  of the vector, just like the example shows...
>  >
>  >
>  > see above.
>  >
>  >  >  2) The code I am writing has a lot of different parts which present a
>  >  >  lot of opportunities to overlap communication an computation when
>  >  >  scattering vectors to update values in the ghost points. Right now,
>  >  >  all of my vectors (there are ~50 of them) share a single DA because
>  >  >  they all have the same shape. However, by sharing a single DA, I can
>  >  >  only scatter one vector at a time. It would be nice to be able to
>  >  >  start scattering each vector right after I'm done computing it, and
>  >  >  finish scattering it right before I need it again but I can't because
>  >  >  other vectors might need to be scattered in between. I then re-wrote
>  >  >  part of my code so that each vector had its own DA object, but this
>  >  >  ended up being incredibly slow (I assume this is because I have so
>  >  >  many vectors).
>  >
>  >  The problem here is that buffering will have to be done for each outstanding
>  >  scatter. Thus I see two resolutions:
>  >
>  >   1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once.
>  >       This is essentially what you accomplish with separate DAs.
>  >
>  >   2) You the dof method. However, this scatter ALL the vectors every time.
>  >
>  >  I do not understand what performance problem you would have with multiple
>  >  DAs. With any performance questions, we suggest sending the output of
>  >  -log_summary so we have data to look at.
>  >
>  >   Matt
>  >
>  >
>  >
>  >  >  My question is, is there a way to scatter multiple vectors
>  >  >  simultaneously without affecting the performance of the code? Does it
>  >  >  make sense to do this?
>  >  >
>  >  >
>  >  >  I'd really appreciate any help...
>  >  >
>  >  >  Thanks
>  >  >  Milad Fatenejad
>  >  >
>  >  >
>  >
>  >
>  >
>  >  --
>  >  What most experimenters take for granted before they begin their
>  >  experiments is infinitely more interesting than any results to which
>  >  their experiments lead.
>  >  -- Norbert Wiener
>  >
>  >
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From icksa1 at gmail.com  Mon May 12 15:18:34 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Mon, 12 May 2008 15:18:34 -0500
Subject: 2 Questions about DAs
In-Reply-To: <OF5D749A0D.D291E6BE-ON85257447.006CB102-85257447.006D198D@seagate.com>
References: <f489d6e30805121158h53faf225sd0ab16112e7c2302@mail.gmail.com>
	 <OF5D749A0D.D291E6BE-ON85257447.006CB102-85257447.006D198D@seagate.com>
Message-ID: <f489d6e30805121318g4b6fb704l86fa25c85cf8a56e@mail.gmail.com>

Hi Amit:

I was thinking of the following situation.
There are two vectors global/local1 and global/local2.
Function f1 modifies the values of global1 and global2.
Function f3 requires the updated ghost point values.
Function f2 doesn't depend on either vector at all, but is really
computationally intensive

I would like to do the following:

f1(global1, global2)            // Both vectors modified, I need to
scatter before calling f3()

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);

f2() // function that takes a really long time

DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

f3(local1, local2)  // Needs the ghost values


If I don't overlap the scattering, I end up with something like this:

f1(global1, global2)            // Both vectors modified, I need to
scatter before calling f3()

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
f2() // function that takes a really long time
DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);

DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
// Nothing left to do, just wait for scattering to end
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

f3(local1, local2)  // Needs the ghost values

In the second case, there is nothing left to do while the second
vector scatters and it seems like I just have to wait for this to
occur. Ideally, I would like to scatter both vectors while waiting for
f2() to finish...

I'm a little new to all of this, so let me know if my understanding is
just wrong...

Thanks
Milad


On Mon, May 12, 2008 at 2:51 PM,  <Amit.Itagi at seagate.com> wrote:
> I don't understand your motivation for trying to have two consecutive
>  scatterBegin before the scatterEnd of the first. I think that the
>  scatterBegin and the corresponding scatterEnd need to thought of as a
>  single scatter operation. Infact, I don't understand the concept of
>  "overlapping" the scattering.
>
>
>  Thanks
>
>  Rgds,
>  Amit
>
>
>
>
>              "Milad Fatenejad"
>              <icksa1 at gmail.com
>
>              >                                                          To
>              Sent by:                  petsc-users at mcs.anl.gov
>              owner-petsc-users                                          cc
>              @mcs.anl.gov
>              No Phone Info                                         Subject
>              Available                 Re: 2 Questions about DAs
>
>
>              05/12/2008 02:58
>
>
>              PM
>
>
>              Please respond to
>              petsc-users at mcs.a
>                   nl.gov
>
>
>
>
>
>
>  Hi,
>  Thanks for the reply.
>
>  If I have two global vectors from the same DA, say global1 and global2
>  and I try to scatter to local vectors local1 and local2 using the
>  command DAGlobalToLocalBegin/End in the following manner:
>
>  DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
>  DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
>
>  DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
>  DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);
>
>  Everything is fine. If instead I do:
>
>  DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
>  DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
>
>  DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
>  DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);
>
>  I get the error:
>  [0]PETSC ERROR: Object is in wrong state!
>  [0]PETSC ERROR:  Scatter ctx already in use!
>
>  What I would like to be able to do is overlap the scattering and that
>  produces the error.
>
>  Thanks
>  Milad
>
>  On Mon, May 12, 2008 at 1:43 PM,  <Amit.Itagi at seagate.com> wrote:
>  > Milad,
>  >
>  >  I have a DA with 6 vectors sharing the DA structure. I defined the DA to
>  >  have just 1 DOF. I generated the 6 vectors from that DA. Also, I scatter
>  >  the vectors separately. Things seem to work fine.
>  >
>  >  Thanks
>  >
>  >  Rgds,
>  >  Amit
>  >
>  >
>  >
>  >
>  >              "Milad Fatenejad"
>  >              <mfatenejad at wisc.
>  >              edu>
>  To
>  >              Sent by:                  petsc-users at mcs.anl.gov
>  >              owner-petsc-users
>  cc
>  >              @mcs.anl.gov
>  >              No Phone Info
>  Subject
>  >              Available                 2 Questions about DAs
>  >
>  >
>  >              05/12/2008 12:02
>  >              PM
>  >
>  >
>  >              Please respond to
>  >              petsc-users at mcs.a
>  >                   nl.gov
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >  Hello:
>  >  I have two separate DA questions:
>  >
>  >  1) I am writing a large finite difference code and would like to be
>  >  able to represent an array of vectors. I am currently doing this by
>  >  creating a single DA and calling DACreateGlobalVector several times,
>  >  but the manual also states that:
>  >
>  >  "PETSc currently provides no container for multiple arrays sharing the
>  >  same distributed array communication; note, however, that the dof
>  >  parameter handles many cases of interest."
>  >
>  >  I also found the following mailing list thread which describes how to
>  >  use the dof parameter to represent several vectors:
>  >
>  >
>  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>
>  >
>  >
>  >  Where the following solution is proposed:
>  >  """
>  >  The easiest thing to do in C is to declare a struct:
>  >
>  >  typedef struct {
>  >   PetscScalar v[3];
>  >   PetscScalar p;
>  >  } Space;
>  >
>  >  and then cast pointers
>  >
>  >   Space ***array;
>  >
>  >   DAVecGetArray(da, u, (void *) &array);
>  >
>  >      array[k][j][i].v *= -1.0;
>  >  """
>  >
>  >  The problem with the proposed solution, is that they use a struct to
>  >  get the individual values, but what if you don't know the number of
>  >  degrees of freedom at compile time?
>  >
>  >  So my question is two fold:
>  >  a) Is there a problem with just having a single DA and calling
>  >  DACreateGlobalVector multiple times? Does this affect performance at
>  >  all (I have many different vectors)?
>  >  b) Is there a way to use the dof parameter when creating a DA when the
>  >  number of degrees of freedom is not known at compile time?
>  >  Specifically, I would like to be able to access the individual values
>  >  of the vector, just like the example shows...
>  >
>  >
>  >  2) The code I am writing has a lot of different parts which present a
>  >  lot of opportunities to overlap communication an computation when
>  >  scattering vectors to update values in the ghost points. Right now,
>  >  all of my vectors (there are ~50 of them) share a single DA because
>  >  they all have the same shape. However, by sharing a single DA, I can
>  >  only scatter one vector at a time. It would be nice to be able to
>  >  start scattering each vector right after I'm done computing it, and
>  >  finish scattering it right before I need it again but I can't because
>  >  other vectors might need to be scattered in between. I then re-wrote
>  >  part of my code so that each vector had its own DA object, but this
>  >  ended up being incredibly slow (I assume this is because I have so
>  >  many vectors).
>  >
>  >  My question is, is there a way to scatter multiple vectors
>  >  simultaneously without affecting the performance of the code? Does it
>  >  make sense to do this?
>  >
>  >
>  >  I'd really appreciate any help...
>  >
>  >  Thanks
>  >  Milad Fatenejad
>  >
>  >
>  >
>  >
>
>
>
>


From Amit.Itagi at seagate.com  Mon May 12 16:14:07 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Mon, 12 May 2008 17:14:07 -0400
Subject: 2 Questions about DAs
In-Reply-To: <f489d6e30805121318g4b6fb704l86fa25c85cf8a56e@mail.gmail.com>
Message-ID: <OFB6D39AFE.7A81A0AD-ON85257447.0074485A-85257447.0074A65F@seagate.com>

May be, if you compare the time taken to scatter and the time taken by f2,
one of them might dominate the other.  Thus, in any case, the slower of the
two will determine the time to complete the scattering and f2 execution.

Thanks

Rgds,
Amit


             "Milad Fatenejad"                                             
             <icksa1 at gmail.com                                             
             >                                                          To 
             Sent by:                  petsc-users at mcs.anl.gov             
             owner-petsc-users                                          cc 
             @mcs.anl.gov                                                  
             No Phone Info                                         Subject 
             Available                 Re: 2 Questions about DAs           
                                                                           
                                                                           
             05/12/2008 04:18                                              
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             petsc-users at mcs.a                                             
                  nl.gov                                                   
                                                                           
                                                                           
Hi Amit:

I was thinking of the following situation.
There are two vectors global/local1 and global/local2.
Function f1 modifies the values of global1 and global2.
Function f3 requires the updated ghost point values.
Function f2 doesn't depend on either vector at all, but is really
computationally intensive

I would like to do the following:

f1(global1, global2)            // Both vectors modified, I need to
scatter before calling f3()

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);

f2() // function that takes a really long time

DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

f3(local1, local2)  // Needs the ghost values


If I don't overlap the scattering, I end up with something like this:

f1(global1, global2)            // Both vectors modified, I need to
scatter before calling f3()

DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
f2() // function that takes a really long time
DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);

DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
// Nothing left to do, just wait for scattering to end
DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);

f3(local1, local2)  // Needs the ghost values

In the second case, there is nothing left to do while the second
vector scatters and it seems like I just have to wait for this to
occur. Ideally, I would like to scatter both vectors while waiting for
f2() to finish...

I'm a little new to all of this, so let me know if my understanding is
just wrong...

Thanks
Milad


On Mon, May 12, 2008 at 2:51 PM,  <Amit.Itagi at seagate.com> wrote:
> I don't understand your motivation for trying to have two consecutive
>  scatterBegin before the scatterEnd of the first. I think that the
>  scatterBegin and the corresponding scatterEnd need to thought of as a
>  single scatter operation. Infact, I don't understand the concept of
>  "overlapping" the scattering.
>
>
>  Thanks
>
>  Rgds,
>  Amit
>
>
>
>
>              "Milad Fatenejad"
>              <icksa1 at gmail.com
>
>              >
To
>              Sent by:                  petsc-users at mcs.anl.gov
>              owner-petsc-users
cc
>              @mcs.anl.gov
>              No Phone Info
Subject
>              Available                 Re: 2 Questions about DAs
>
>
>              05/12/2008 02:58
>
>
>              PM
>
>
>              Please respond to
>              petsc-users at mcs.a
>                   nl.gov
>
>
>
>
>
>
>  Hi,
>  Thanks for the reply.
>
>  If I have two global vectors from the same DA, say global1 and global2
>  and I try to scatter to local vectors local1 and local2 using the
>  command DAGlobalToLocalBegin/End in the following manner:
>
>  DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
>  DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
>
>  DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
>  DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);
>
>  Everything is fine. If instead I do:
>
>  DAGlobalToLocalBegin(da, global1, INSERT_VALUES, local1);
>  DAGlobalToLocalBegin(da, global2, INSERT_VALUES, local2);
>
>  DAGlobalToLocalEnd(da, global1, INSERT_VALUES, local1);
>  DAGlobalToLocalEnd(da, global2, INSERT_VALUES, local2);
>
>  I get the error:
>  [0]PETSC ERROR: Object is in wrong state!
>  [0]PETSC ERROR:  Scatter ctx already in use!
>
>  What I would like to be able to do is overlap the scattering and that
>  produces the error.
>
>  Thanks
>  Milad
>
>  On Mon, May 12, 2008 at 1:43 PM,  <Amit.Itagi at seagate.com> wrote:
>  > Milad,
>  >
>  >  I have a DA with 6 vectors sharing the DA structure. I defined the DA
to
>  >  have just 1 DOF. I generated the 6 vectors from that DA. Also, I
scatter
>  >  the vectors separately. Things seem to work fine.
>  >
>  >  Thanks
>  >
>  >  Rgds,
>  >  Amit
>  >
>  >
>  >
>  >
>  >              "Milad Fatenejad"
>  >              <mfatenejad at wisc.
>  >              edu>
>  To
>  >              Sent by:                  petsc-users at mcs.anl.gov
>  >              owner-petsc-users
>  cc
>  >              @mcs.anl.gov
>  >              No Phone Info
>  Subject
>  >              Available                 2 Questions about DAs
>  >
>  >
>  >              05/12/2008 12:02
>  >              PM
>  >
>  >
>  >              Please respond to
>  >              petsc-users at mcs.a
>  >                   nl.gov
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >  Hello:
>  >  I have two separate DA questions:
>  >
>  >  1) I am writing a large finite difference code and would like to be
>  >  able to represent an array of vectors. I am currently doing this by
>  >  creating a single DA and calling DACreateGlobalVector several times,
>  >  but the manual also states that:
>  >
>  >  "PETSc currently provides no container for multiple arrays sharing
the
>  >  same distributed array communication; note, however, that the dof
>  >  parameter handles many cases of interest."
>  >
>  >  I also found the following mailing list thread which describes how to
>  >  use the dof parameter to represent several vectors:
>  >
>  >
>
http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html

>
>  >
>  >
>  >  Where the following solution is proposed:
>  >  """
>  >  The easiest thing to do in C is to declare a struct:
>  >
>  >  typedef struct {
>  >   PetscScalar v[3];
>  >   PetscScalar p;
>  >  } Space;
>  >
>  >  and then cast pointers
>  >
>  >   Space ***array;
>  >
>  >   DAVecGetArray(da, u, (void *) &array);
>  >
>  >      array[k][j][i].v *= -1.0;
>  >  """
>  >
>  >  The problem with the proposed solution, is that they use a struct to
>  >  get the individual values, but what if you don't know the number of
>  >  degrees of freedom at compile time?
>  >
>  >  So my question is two fold:
>  >  a) Is there a problem with just having a single DA and calling
>  >  DACreateGlobalVector multiple times? Does this affect performance at
>  >  all (I have many different vectors)?
>  >  b) Is there a way to use the dof parameter when creating a DA when
the
>  >  number of degrees of freedom is not known at compile time?
>  >  Specifically, I would like to be able to access the individual values
>  >  of the vector, just like the example shows...
>  >
>  >
>  >  2) The code I am writing has a lot of different parts which present a
>  >  lot of opportunities to overlap communication an computation when
>  >  scattering vectors to update values in the ghost points. Right now,
>  >  all of my vectors (there are ~50 of them) share a single DA because
>  >  they all have the same shape. However, by sharing a single DA, I can
>  >  only scatter one vector at a time. It would be nice to be able to
>  >  start scattering each vector right after I'm done computing it, and
>  >  finish scattering it right before I need it again but I can't because
>  >  other vectors might need to be scattered in between. I then re-wrote
>  >  part of my code so that each vector had its own DA object, but this
>  >  ended up being incredibly slow (I assume this is because I have so
>  >  many vectors).
>  >
>  >  My question is, is there a way to scatter multiple vectors
>  >  simultaneously without affecting the performance of the code? Does it
>  >  make sense to do this?
>  >
>  >
>  >  I'd really appreciate any help...
>  >
>  >  Thanks
>  >  Milad Fatenejad
>  >
>  >
>  >
>  >
>
>
>
>


From icksa1 at gmail.com  Mon May 12 16:28:50 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Mon, 12 May 2008 16:28:50 -0500
Subject: 2 Questions about DAs
In-Reply-To: <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
	 <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>
	 <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com>
	 <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com>
Message-ID: <f489d6e30805121428t6ca66bc3q919e52579efc5887@mail.gmail.com>

Hi Matt:
The code is several thousand lines long, requires many external
libraries and is generally very messy right now. I'd rather not send
it because I wouldn't want to take up too much of your time. I think I
will try to go back and try set up some simpler problems to test the
difference between 1 vs. many DA's, and will write back if I have the
same issue.

Thank you
Milad

On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad <icksa1 at gmail.com> wrote:
>  > Hello:
>  >  I've attached the result of two calculations. The file "log-multi-da"
>  >  uses 1 DA for each vector (322 in all) and the file "log-single-da"
>  >  using 1 DA for the entire calculation. When using 322 DA's, about 10x
>  >  more time is spent in VecScatterBegin and VecScatterEnd. Both were
>  >  running using two processes
>  >
>  >  I should mention that the source code for these two runs was exactly
>  >  the same, I didn't reorder the scatters differently. The only
>  >  difference was the number of DAs
>  >
>  >  Any suggestions? Do you think this is related to the number of DA's,
>  >  or something else?
>
>  There are vastly different numbers of reductions and much bigger memory usage.
>  Please send the code and I will look at it.
>
>   Matt
>
>
>
>  >  Thanks for your help
>  >  Milad
>  >
>  >  On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley <knepley at gmail.com> wrote:
>  >  >
>  >  > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad <mfatenejad at wisc.edu> wrote:
>  >  >  > Hello:
>  >  >  >  I have two separate DA questions:
>  >  >  >
>  >  >  >  1) I am writing a large finite difference code and would like to be
>  >  >  >  able to represent an array of vectors. I am currently doing this by
>  >  >  >  creating a single DA and calling DACreateGlobalVector several times,
>  >  >  >  but the manual also states that:
>  >  >  >
>  >  >  >  "PETSc currently provides no container for multiple arrays sharing the
>  >  >  >  same distributed array communication; note, however, that the dof
>  >  >  >  parameter handles many cases of interest."
>  >  >  >
>  >  >  >  I also found the following mailing list thread which describes how to
>  >  >  >  use the dof parameter to represent several vectors:
>  >  >  >
>  >  >  >
>  >  >  >  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>  >  >  >
>  >  >  >  Where the following solution is proposed:
>  >  >  >  """
>  >  >  >  The easiest thing to do in C is to declare a struct:
>  >  >  >
>  >  >  >  typedef struct {
>  >  >  >   PetscScalar v[3];
>  >  >  >   PetscScalar p;
>  >  >  >  } Space;
>  >  >  >
>  >  >  >  and then cast pointers
>  >  >  >
>  >  >  >   Space ***array;
>  >  >  >
>  >  >  >   DAVecGetArray(da, u, (void *) &array);
>  >  >  >
>  >  >  >      array[k][j][i].v *= -1.0;
>  >  >  >  """
>  >  >  >
>  >  >  >  The problem with the proposed solution, is that they use a struct to
>  >  >  >  get the individual values, but what if you don't know the number of
>  >  >  >  degrees of freedom at compile time?
>  >  >
>  >  >  It would be nice to get variable structs in C. However, you can just deference
>  >  >  the object directly. For example, for 50 degrees of freedom, you can do
>  >  >
>  >  >    array[k][j][i][47] *= -1.0;
>  >  >
>  >  >
>  >  >  >  So my question is two fold:
>  >  >  >  a) Is there a problem with just having a single DA and calling
>  >  >  >  DACreateGlobalVector multiple times? Does this affect performance at
>  >  >  >  all (I have many different vectors)?
>  >  >
>  >  >  These are all independent objects. Thus, by itself, creating any number of
>  >  >  Vecs does nothing to performance (unless you start to run out of memory).
>  >  >
>  >  >
>  >  >  >  b) Is there a way to use the dof parameter when creating a DA when the
>  >  >  >  number of degrees of freedom is not known at compile time?
>  >  >  >  Specifically, I would like to be able to access the individual values
>  >  >  >  of the vector, just like the example shows...
>  >  >
>  >  >
>  >  > see above.
>  >  >
>  >  >  >  2) The code I am writing has a lot of different parts which present a
>  >  >  >  lot of opportunities to overlap communication an computation when
>  >  >  >  scattering vectors to update values in the ghost points. Right now,
>  >  >  >  all of my vectors (there are ~50 of them) share a single DA because
>  >  >  >  they all have the same shape. However, by sharing a single DA, I can
>  >  >  >  only scatter one vector at a time. It would be nice to be able to
>  >  >  >  start scattering each vector right after I'm done computing it, and
>  >  >  >  finish scattering it right before I need it again but I can't because
>  >  >  >  other vectors might need to be scattered in between. I then re-wrote
>  >  >  >  part of my code so that each vector had its own DA object, but this
>  >  >  >  ended up being incredibly slow (I assume this is because I have so
>  >  >  >  many vectors).
>  >  >
>  >  >  The problem here is that buffering will have to be done for each outstanding
>  >  >  scatter. Thus I see two resolutions:
>  >  >
>  >  >   1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once.
>  >  >       This is essentially what you accomplish with separate DAs.
>  >  >
>  >  >   2) You the dof method. However, this scatter ALL the vectors every time.
>  >  >
>  >  >  I do not understand what performance problem you would have with multiple
>  >  >  DAs. With any performance questions, we suggest sending the output of
>  >  >  -log_summary so we have data to look at.
>  >  >
>  >  >   Matt
>  >  >
>  >  >
>  >  >
>  >  >  >  My question is, is there a way to scatter multiple vectors
>  >  >  >  simultaneously without affecting the performance of the code? Does it
>  >  >  >  make sense to do this?
>  >  >  >
>  >  >  >
>  >  >  >  I'd really appreciate any help...
>  >  >  >
>  >  >  >  Thanks
>  >  >  >  Milad Fatenejad
>  >  >  >
>  >  >  >
>  >  >
>  >  >
>  >  >
>  >  >  --
>  >  >  What most experimenters take for granted before they begin their
>  >  >  experiments is infinitely more interesting than any results to which
>  >  >  their experiments lead.
>  >  >  -- Norbert Wiener
>  >  >
>  >  >
>  >
>
>
>
>  --
>
>
> What most experimenters take for granted before they begin their
>  experiments is infinitely more interesting than any results to which
>  their experiments lead.
>  -- Norbert Wiener
>
>


From icksa1 at gmail.com  Mon May 12 18:21:07 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Mon, 12 May 2008 18:21:07 -0500
Subject: 2 Questions about DAs
In-Reply-To: <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
	 <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>
	 <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com>
	 <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com>
Message-ID: <f489d6e30805121621s94740a8kcc6b0d598463c953@mail.gmail.com>

Hi:
I created a simple test problem that demonstrates the issue. In the
test problem, 100 vectors are created using:
single.cpp: a single distributed array and
multi.cpp: 100 distributed arrays

Some math is performed on the vectors, then they are scattered to
local vectors..

The log summary (running 2 processes) shows that multi.cpp uses more
memory and performs more reductions than single.cpp, which is similar
to the experience I had with my program...

I hope this helps
Milad

On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad <icksa1 at gmail.com> wrote:
>  > Hello:
>  >  I've attached the result of two calculations. The file "log-multi-da"
>  >  uses 1 DA for each vector (322 in all) and the file "log-single-da"
>  >  using 1 DA for the entire calculation. When using 322 DA's, about 10x
>  >  more time is spent in VecScatterBegin and VecScatterEnd. Both were
>  >  running using two processes
>  >
>  >  I should mention that the source code for these two runs was exactly
>  >  the same, I didn't reorder the scatters differently. The only
>  >  difference was the number of DAs
>  >
>  >  Any suggestions? Do you think this is related to the number of DA's,
>  >  or something else?
>
>  There are vastly different numbers of reductions and much bigger memory usage.
>  Please send the code and I will look at it.
>
>   Matt
>
>
>
>  >  Thanks for your help
>  >  Milad
>  >
>  >  On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley <knepley at gmail.com> wrote:
>  >  >
>  >  > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad <mfatenejad at wisc.edu> wrote:
>  >  >  > Hello:
>  >  >  >  I have two separate DA questions:
>  >  >  >
>  >  >  >  1) I am writing a large finite difference code and would like to be
>  >  >  >  able to represent an array of vectors. I am currently doing this by
>  >  >  >  creating a single DA and calling DACreateGlobalVector several times,
>  >  >  >  but the manual also states that:
>  >  >  >
>  >  >  >  "PETSc currently provides no container for multiple arrays sharing the
>  >  >  >  same distributed array communication; note, however, that the dof
>  >  >  >  parameter handles many cases of interest."
>  >  >  >
>  >  >  >  I also found the following mailing list thread which describes how to
>  >  >  >  use the dof parameter to represent several vectors:
>  >  >  >
>  >  >  >
>  >  >  >  http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>  >  >  >
>  >  >  >  Where the following solution is proposed:
>  >  >  >  """
>  >  >  >  The easiest thing to do in C is to declare a struct:
>  >  >  >
>  >  >  >  typedef struct {
>  >  >  >   PetscScalar v[3];
>  >  >  >   PetscScalar p;
>  >  >  >  } Space;
>  >  >  >
>  >  >  >  and then cast pointers
>  >  >  >
>  >  >  >   Space ***array;
>  >  >  >
>  >  >  >   DAVecGetArray(da, u, (void *) &array);
>  >  >  >
>  >  >  >      array[k][j][i].v *= -1.0;
>  >  >  >  """
>  >  >  >
>  >  >  >  The problem with the proposed solution, is that they use a struct to
>  >  >  >  get the individual values, but what if you don't know the number of
>  >  >  >  degrees of freedom at compile time?
>  >  >
>  >  >  It would be nice to get variable structs in C. However, you can just deference
>  >  >  the object directly. For example, for 50 degrees of freedom, you can do
>  >  >
>  >  >    array[k][j][i][47] *= -1.0;
>  >  >
>  >  >
>  >  >  >  So my question is two fold:
>  >  >  >  a) Is there a problem with just having a single DA and calling
>  >  >  >  DACreateGlobalVector multiple times? Does this affect performance at
>  >  >  >  all (I have many different vectors)?
>  >  >
>  >  >  These are all independent objects. Thus, by itself, creating any number of
>  >  >  Vecs does nothing to performance (unless you start to run out of memory).
>  >  >
>  >  >
>  >  >  >  b) Is there a way to use the dof parameter when creating a DA when the
>  >  >  >  number of degrees of freedom is not known at compile time?
>  >  >  >  Specifically, I would like to be able to access the individual values
>  >  >  >  of the vector, just like the example shows...
>  >  >
>  >  >
>  >  > see above.
>  >  >
>  >  >  >  2) The code I am writing has a lot of different parts which present a
>  >  >  >  lot of opportunities to overlap communication an computation when
>  >  >  >  scattering vectors to update values in the ghost points. Right now,
>  >  >  >  all of my vectors (there are ~50 of them) share a single DA because
>  >  >  >  they all have the same shape. However, by sharing a single DA, I can
>  >  >  >  only scatter one vector at a time. It would be nice to be able to
>  >  >  >  start scattering each vector right after I'm done computing it, and
>  >  >  >  finish scattering it right before I need it again but I can't because
>  >  >  >  other vectors might need to be scattered in between. I then re-wrote
>  >  >  >  part of my code so that each vector had its own DA object, but this
>  >  >  >  ended up being incredibly slow (I assume this is because I have so
>  >  >  >  many vectors).
>  >  >
>  >  >  The problem here is that buffering will have to be done for each outstanding
>  >  >  scatter. Thus I see two resolutions:
>  >  >
>  >  >   1) Duplicate the DA scatter for as many Vecs as you wish to scatter at once.
>  >  >       This is essentially what you accomplish with separate DAs.
>  >  >
>  >  >   2) You the dof method. However, this scatter ALL the vectors every time.
>  >  >
>  >  >  I do not understand what performance problem you would have with multiple
>  >  >  DAs. With any performance questions, we suggest sending the output of
>  >  >  -log_summary so we have data to look at.
>  >  >
>  >  >   Matt
>  >  >
>  >  >
>  >  >
>  >  >  >  My question is, is there a way to scatter multiple vectors
>  >  >  >  simultaneously without affecting the performance of the code? Does it
>  >  >  >  make sense to do this?
>  >  >  >
>  >  >  >
>  >  >  >  I'd really appreciate any help...
>  >  >  >
>  >  >  >  Thanks
>  >  >  >  Milad Fatenejad
>  >  >  >
>  >  >  >
>  >  >
>  >  >
>  >  >
>  >  >  --
>  >  >  What most experimenters take for granted before they begin their
>  >  >  experiments is infinitely more interesting than any results to which
>  >  >  their experiments lead.
>  >  >  -- Norbert Wiener
>  >  >
>  >  >
>  >
>
>
>
>  --
>
>
> What most experimenters take for granted before they begin their
>  experiments is infinitely more interesting than any results to which
>  their experiments lead.
>  -- Norbert Wiener
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log-multi
Type: application/octet-stream
Size: 10666 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/ffd5be5c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log-single
Type: application/octet-stream
Size: 10666 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/ffd5be5c/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multi.cpp
Type: text/x-c++src
Size: 1557 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/ffd5be5c/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: single.cpp
Type: text/x-c++src
Size: 1449 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080512/ffd5be5c/attachment-0001.cpp>

From bsmith at mcs.anl.gov  Mon May 12 19:07:33 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 12 May 2008 19:07:33 -0500
Subject: Q. of multi-componet system and data from input file
In-Reply-To: <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl>
References: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com> <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl>
Message-ID: <5BABB255-02D5-40E9-9899-0B62CB3AE7C0@mcs.anl.gov>


On May 12, 2008, at 1:59 PM, tsjb00 wrote:

>
> Hi, there! I am a beginner of PETSc and I have some questions about  
> using PETSc to solve for a multi-componet system. The code is  
> supposed to be applicable to different systems, where number of  
> components, properties of components ,etc. would be input for the  
> program.
>
> Say I define DA with dof=number of components = nc, number of grid  
> in x,y,z = nx,ny,nz respectively. When I use DA related functions,  
> it seems that by default the data objects (vectors, arrays, etc.)  
> would be of nx*ny*nz*nc. However, some physical variables are  
> independent of specific components, which means I need to handle  
> data objects of nx*ny*nz*integral. My questions are:
>
> Does PETSc include tools or examples to deal with such problems?
>
> If not, how can I make sure the 'nx*ny*nz*any integral' data objects  
> are distributed over the nodes in a way defined by DA? I am using  
> PETSc_Decide for partitioning right now. I would prefer that at  
> least the number of processors be flexible.

    I do not understand your question but here is a stab at it. For  
each different nc you need in your code you simply create a different  
DA. For example if you have
two fields you want together and also have three fields together you  
would create one DA for the 2 dof and one for the 3 dof. Adding a  
couple more DA's won't take
much memory or time.

    You can use DAVecGetArray() to access the values for a fixed dof,  
if sometimes you want nc to be different for different runs within the  
same
loops you can use DAVecGetArrayDOF(). You access values via x[k][j][i] 
[l] where l goes from 0 to dof-1 for the dof that you used to create  
the DA.

    Barry

>
>
> I need to read in a property f(x,y,z) from a data file and then  
> distribute the data across different processors. Any suggestions on  
> this would be appreciated. My concern is that if I use MPI_Send/ 
> Receive, the data to be transferred might correspond to  
> discontinuous indices due to the partitioning.
>
> Many thanks in advance!
>
> BJ
>
>>
>
> _________________________________________________________________
> MSN ????????????????????
> http://cn.msn.com
>
>


From bsmith at mcs.anl.gov  Mon May 12 19:22:04 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 12 May 2008 19:22:04 -0500
Subject: 2 Questions about DAs
In-Reply-To: <f489d6e30805121621s94740a8kcc6b0d598463c953@mail.gmail.com>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com> <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com> <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com> <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com> <f489d6e30805121621s94740a8kcc6b0d598463c953@mail.gmail.com>
Message-ID: <CBC2FD27-1AFA-4B52-A1E6-F19278C5E257@mcs.anl.gov>


   A couple of items.

     Overlapping communication and computation is pretty much a myth.  
The CPU is used by MPI to pack
the messages and put them on the network so it is not available for  
computation during this time. Usually
if you try to overlap communication and computation it will end up  
being slower and I've never seen it faster.
Vendors will try to trick you into buying a machine by saying it does  
it, but it really doesn't. Just forget about trying to do it.

    Creating a DA involves a good amount of setup and some  
communication; it is fine to use a few DA's
but setting up hundreds of DAs is not a good idea UNLESS YOU DO TONS  
OF WORK for each DA.
In your case you are doing just a tiny amount of communication  with  
each DA so the DA setup time
is dominating.

   If you have hundreds of vectors that you wish to communicate AT THE  
SAME TIME (seems strange but
I suppose it is possible), then rather than having hundreds of  
DAGlobalToLocalBegin/End() in a row
you will want to create an additional "meta" DA that has the same  
m,n,p as the regular DA but has a
dof equal to the number of vectors you wish to communicate at the same  
time. Use VecStrideScatterAll()
to get the individual vectors into a meta vector, do the  
DAGlobalToLocalBegin/End() on the meta vector
to get the ghost values and then use DAStrideGatherAll() to get the  
values into the 322 individual ghosted
vectors. The reason to do it this way is so the values in all the  
vectors are all sent together in a single
MPI message instead of the separate message that would needed for each  
of the small
DAGlobalToLocalBegin/End().

    Barry


On May 12, 2008, at 6:21 PM, Milad Fatenejad wrote:

> Hi:
> I created a simple test problem that demonstrates the issue. In the
> test problem, 100 vectors are created using:
> single.cpp: a single distributed array and
> multi.cpp: 100 distributed arrays
>
> Some math is performed on the vectors, then they are scattered to
> local vectors..
>
> The log summary (running 2 processes) shows that multi.cpp uses more
> memory and performs more reductions than single.cpp, which is similar
> to the experience I had with my program...
>
> I hope this helps
> Milad
>
> On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley <knepley at gmail.com>  
> wrote:
>> On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad <icksa1 at gmail.com>  
>> wrote:
>>> Hello:
>>> I've attached the result of two calculations. The file "log-multi- 
>>> da"
>>> uses 1 DA for each vector (322 in all) and the file "log-single-da"
>>> using 1 DA for the entire calculation. When using 322 DA's, about  
>>> 10x
>>> more time is spent in VecScatterBegin and VecScatterEnd. Both were
>>> running using two processes
>>>
>>> I should mention that the source code for these two runs was exactly
>>> the same, I didn't reorder the scatters differently. The only
>>> difference was the number of DAs
>>>
>>> Any suggestions? Do you think this is related to the number of DA's,
>>> or something else?
>>
>> There are vastly different numbers of reductions and much bigger  
>> memory usage.
>> Please send the code and I will look at it.
>>
>>  Matt
>>
>>
>>
>>> Thanks for your help
>>> Milad
>>>
>>> On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley  
>>> <knepley at gmail.com> wrote:
>>>>
>>>> On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad <mfatenejad at wisc.edu 
>>>> > wrote:
>>>>> Hello:
>>>>> I have two separate DA questions:
>>>>>
>>>>> 1) I am writing a large finite difference code and would like to  
>>>>> be
>>>>> able to represent an array of vectors. I am currently doing this  
>>>>> by
>>>>> creating a single DA and calling DACreateGlobalVector several  
>>>>> times,
>>>>> but the manual also states that:
>>>>>
>>>>> "PETSc currently provides no container for multiple arrays  
>>>>> sharing the
>>>>> same distributed array communication; note, however, that the dof
>>>>> parameter handles many cases of interest."
>>>>>
>>>>> I also found the following mailing list thread which describes  
>>>>> how to
>>>>> use the dof parameter to represent several vectors:
>>>>>
>>>>>
>>>>> http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
>>>>>
>>>>> Where the following solution is proposed:
>>>>> """
>>>>> The easiest thing to do in C is to declare a struct:
>>>>>
>>>>> typedef struct {
>>>>>  PetscScalar v[3];
>>>>>  PetscScalar p;
>>>>> } Space;
>>>>>
>>>>> and then cast pointers
>>>>>
>>>>>  Space ***array;
>>>>>
>>>>>  DAVecGetArray(da, u, (void *) &array);
>>>>>
>>>>>     array[k][j][i].v *= -1.0;
>>>>> """
>>>>>
>>>>> The problem with the proposed solution, is that they use a  
>>>>> struct to
>>>>> get the individual values, but what if you don't know the number  
>>>>> of
>>>>> degrees of freedom at compile time?
>>>>
>>>> It would be nice to get variable structs in C. However, you can  
>>>> just deference
>>>> the object directly. For example, for 50 degrees of freedom, you  
>>>> can do
>>>>
>>>>   array[k][j][i][47] *= -1.0;
>>>>
>>>>
>>>>> So my question is two fold:
>>>>> a) Is there a problem with just having a single DA and calling
>>>>> DACreateGlobalVector multiple times? Does this affect  
>>>>> performance at
>>>>> all (I have many different vectors)?
>>>>
>>>> These are all independent objects. Thus, by itself, creating any  
>>>> number of
>>>> Vecs does nothing to performance (unless you start to run out of  
>>>> memory).
>>>>
>>>>
>>>>> b) Is there a way to use the dof parameter when creating a DA  
>>>>> when the
>>>>> number of degrees of freedom is not known at compile time?
>>>>> Specifically, I would like to be able to access the individual  
>>>>> values
>>>>> of the vector, just like the example shows...
>>>>
>>>>
>>>> see above.
>>>>
>>>>> 2) The code I am writing has a lot of different parts which  
>>>>> present a
>>>>> lot of opportunities to overlap communication an computation when
>>>>> scattering vectors to update values in the ghost points. Right  
>>>>> now,
>>>>> all of my vectors (there are ~50 of them) share a single DA  
>>>>> because
>>>>> they all have the same shape. However, by sharing a single DA, I  
>>>>> can
>>>>> only scatter one vector at a time. It would be nice to be able to
>>>>> start scattering each vector right after I'm done computing it,  
>>>>> and
>>>>> finish scattering it right before I need it again but I can't  
>>>>> because
>>>>> other vectors might need to be scattered in between. I then re- 
>>>>> wrote
>>>>> part of my code so that each vector had its own DA object, but  
>>>>> this
>>>>> ended up being incredibly slow (I assume this is because I have so
>>>>> many vectors).
>>>>
>>>> The problem here is that buffering will have to be done for each  
>>>> outstanding
>>>> scatter. Thus I see two resolutions:
>>>>
>>>>  1) Duplicate the DA scatter for as many Vecs as you wish to  
>>>> scatter at once.
>>>>      This is essentially what you accomplish with separate DAs.
>>>>
>>>>  2) You the dof method. However, this scatter ALL the vectors  
>>>> every time.
>>>>
>>>> I do not understand what performance problem you would have with  
>>>> multiple
>>>> DAs. With any performance questions, we suggest sending the  
>>>> output of
>>>> -log_summary so we have data to look at.
>>>>
>>>>  Matt
>>>>
>>>>
>>>>
>>>>> My question is, is there a way to scatter multiple vectors
>>>>> simultaneously without affecting the performance of the code?  
>>>>> Does it
>>>>> make sense to do this?
>>>>>
>>>>>
>>>>> I'd really appreciate any help...
>>>>>
>>>>> Thanks
>>>>> Milad Fatenejad
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to  
>>>> which
>>>> their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>>
>>>
>>
>>
>>
>> --
>>
>>
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>>
> <log-multi><log-single><multi.cpp><single.cpp>


From neckel at in.tum.de  Tue May 13 00:42:08 2008
From: neckel at in.tum.de (Tobias Neckel)
Date: Tue, 13 May 2008 07:42:08 +0200
Subject: Question on matrix preallocation
In-Reply-To: <a9f269830805080743h1f8ce70di54392c9630591b03@mail.gmail.com>
References: <48230924.7040600@in.tum.de> <a9f269830805080743h1f8ce70di54392c9630591b03@mail.gmail.com>
Message-ID: <48292A30.8040803@in.tum.de>

> You are assembling before inserting values. This wipes out the preallocation
> information since assembly shrinks the matrix to an optimal size.

Sorry for being quite late, but thanks a lot, Matt! The assembling caused the problem 
also in my real code ... small cause, large effect.

Now it is running fast :-)

Best regards
Tobias

-- 
Dipl.-Tech. Math. Tobias Neckel

Institut f?r Informatik V, TU M?nchen
Boltzmannstr. 3, 85748 Garching

Tel.:   089/289-18602
Email:  neckel at in.tum.de
URL:    http://www5.in.tum.de/persons/neckel.html


From gaatenek at irisa.fr  Tue May 13 08:10:02 2008
From: gaatenek at irisa.fr (gaatenek at irisa.fr)
Date: Tue, 13 May 2008 15:10:02 +0200 (CEST)
Subject: MatILUDTFactor
In-Reply-To: <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov>
References: <48209347.1060500@fluorem.com>
    <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov>
Message-ID: <42920.131.254.11.127.1210684202.squirrel@mail.irisa.fr>

Hello,
I am trying to use MatILUDTFactor make an incomplete factorisation for
preconditionner.
I did not if it work correctly. In general case, when you reduce you drop
tolerance criteria your preconditionner become better, but I all example
that I use he dot not change. I dont know if I use it well? Do I have to
install SPARSEKIT2 to make it work well?
Guy Atenekeng


From knepley at gmail.com  Tue May 13 08:21:14 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 13 May 2008 08:21:14 -0500
Subject: MatILUDTFactor
In-Reply-To: <42920.131.254.11.127.1210684202.squirrel@mail.irisa.fr>
References: <48209347.1060500@fluorem.com>
	 <24F38514-B95B-4EFB-A1CB-DE832C8C1C9E@mcs.anl.gov>
	 <42920.131.254.11.127.1210684202.squirrel@mail.irisa.fr>
Message-ID: <a9f269830805130621r1c03be9bp614c510a2031a115@mail.gmail.com>

On Tue, May 13, 2008 at 8:10 AM,  <gaatenek at irisa.fr> wrote:
> Hello,
>  I am trying to use MatILUDTFactor make an incomplete factorisation for
>  preconditioner.
>  I did not if it work correctly. In general case, when you reduce you drop
>  tolerance criteria your preconditioner become better, but I all example
>  that I use he dot not change. I don't know if I use it well? Do I have to
>  install SPARSEKIT2 to make it work well?

1) ILU does not necessarily improve with increasing fill. There are no
theoretical
    results for this PC.

2) I would first run with -ksp_view to see exactly what you have

  Matt

>  Guy Atenekeng
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From tsjb00 at hotmail.com  Tue May 13 10:14:05 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Tue, 13 May 2008 15:14:05 +0000
Subject: Q. of multi-componet system and data from input file
In-Reply-To: <5BABB255-02D5-40E9-9899-0B62CB3AE7C0@mcs.anl.gov>
References: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com>
 <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl>
 <5BABB255-02D5-40E9-9899-0B62CB3AE7C0@mcs.anl.gov>
Message-ID: <BAY110-W401491EAA9245EFAA130C9C0CF0@phx.gbl>


Many thanks for your help!
----------------------------------------
> From: bsmith at mcs.anl.gov
> To: petsc-users at mcs.anl.gov
> Subject: Re: Q. of multi-componet system and data from input file
> Date: Mon, 12 May 2008 19:07:33 -0500
> 
> 
> On May 12, 2008, at 1:59 PM, tsjb00 wrote:
> 
>>
>> Hi, there! I am a beginner of PETSc and I have some questions about  
>> using PETSc to solve for a multi-componet system. The code is  
>> supposed to be applicable to different systems, where number of  
>> components, properties of components ,etc. would be input for the  
>> program.
>>
>> Say I define DA with dof=number of components = nc, number of grid  
>> in x,y,z = nx,ny,nz respectively. When I use DA related functions,  
>> it seems that by default the data objects (vectors, arrays, etc.)  
>> would be of nx*ny*nz*nc. However, some physical variables are  
>> independent of specific components, which means I need to handle  
>> data objects of nx*ny*nz*integral. My questions are:
>>
>> Does PETSc include tools or examples to deal with such problems?
>>
>> If not, how can I make sure the 'nx*ny*nz*any integral' data objects  
>> are distributed over the nodes in a way defined by DA? I am using  
>> PETSc_Decide for partitioning right now. I would prefer that at  
>> least the number of processors be flexible.
> 
>     I do not understand your question but here is a stab at it. For  
> each different nc you need in your code you simply create a different  
> DA. For example if you have
> two fields you want together and also have three fields together you  
> would create one DA for the 2 dof and one for the 3 dof. Adding a  
> couple more DA's won't take
> much memory or time.
> 
>     You can use DAVecGetArray() to access the values for a fixed dof,  
> if sometimes you want nc to be different for different runs within the  
> same
> loops you can use DAVecGetArrayDOF(). You access values via x[k][j][i] 
> [l] where l goes from 0 to dof-1 for the dof that you used to create  
> the DA.
> 
>     Barry
> 
>>
>>
>> I need to read in a property f(x,y,z) from a data file and then  
>> distribute the data across different processors. Any suggestions on  
>> this would be appreciated. My concern is that if I use MPI_Send/ 
>> Receive, the data to be transferred might correspond to  
>> discontinuous indices due to the partitioning.
>>
>> Many thanks in advance!
>>
>> BJ
>>
>>>
>>
>> _________________________________________________________________
>> MSN ????????????????????
>> http://cn.msn.com
>>
>>
> 

_________________________________________________________________
Windows Live Photo gallery ?????????????????????????????
http://get.live.cn/product/photo.html


From tsjb00 at hotmail.com  Tue May 13 10:14:38 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Tue, 13 May 2008 15:14:38 +0000
Subject: Q. of multi-componet system and data from input file
In-Reply-To: <a9f269830805121233t695f431bx37aa3c66d6634df0@mail.gmail.com>
References: <f489d6e30805121141p655c6f7ct4d4863694b26d74b@mail.gmail.com>
	 <BAY110-W25A1D7531C87363CAAE528C0CC0@phx.gbl> 
 <a9f269830805121233t695f431bx37aa3c66d6634df0@mail.gmail.com>
Message-ID: <BAY110-W429D440F57AA22F769A3D8C0CF0@phx.gbl>


Many thanks for the reply! It really helps!
----------------------------------------
> Date: Mon, 12 May 2008 14:33:27 -0500
> From: knepley at gmail.com
> To: petsc-users at mcs.anl.gov
> Subject: Re: Q. of multi-componet system and data from input file
> 
> 2008/5/12 tsjb00 :
>>
>>  Hi, there! I am a beginner of PETSc and I have some questions about using PETSc to solve for a multi-componet system. The code is supposed to be applicable to different systems, where number of components, properties of components ,etc. would be input for the program.
>>
>>  Say I define DA with dof=number of components = nc, number of grid in x,y,z = nx,ny,nz respectively. When I use DA related functions, it seems that by default the data objects (vectors, arrays, etc.) would be of nx*ny*nz*nc. However, some physical variables are independent of specific components, which means I need to handle data objects of nx*ny*nz*integral. My questions are:
>>
>>  Does PETSc include tools or examples to deal with such problems?
> 
> Make a new DA for those vectors. DA are extremely small since they
> store O(1) data.
> 
>>  If not, how can I make sure the 'nx*ny*nz*any integral' data objects are distributed over the nodes in a way defined by DA? I am using PETSc_Decide for partitioning right now. I would prefer that at least the number of processors be flexible.
>>
>>  I need to read in a property f(x,y,z) from a data file and then distribute the data across different processors. Any suggestions on this would be appreciated. My concern is that if I use MPI_Send/Receive, the data to be transferred might correspond to discontinuous indices due to the partitioning.
> 
> If you store that data in PETSc Vec format, you can just use VecLoad()
> and we will distribute everything for you. A
> simple way to do this, is to read it in on 1 process, put it in a Vec,
> and VecView(). Then you can read it back in on
> multiple processes after that.
> 
>   Matt
> 
>>  Many thanks in advance!
>>
>>  BJ
>>
>> >
>>
>>
>>  _________________________________________________________________
>>  MSN ????????????????????
>>  http://cn.msn.com
>>
>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
> 

_________________________________________________________________
???MSN???????????????????
http://mobile.msn.com.cn/


From icksa1 at gmail.com  Tue May 13 10:46:35 2008
From: icksa1 at gmail.com (Milad Fatenejad)
Date: Tue, 13 May 2008 10:46:35 -0500
Subject: 2 Questions about DAs
In-Reply-To: <CBC2FD27-1AFA-4B52-A1E6-F19278C5E257@mcs.anl.gov>
References: <f489d6e30805120902r2b58f136r453a7577eb004126@mail.gmail.com>
	 <a9f269830805121156sef6e577n2221ab112138bebe@mail.gmail.com>
	 <f489d6e30805121301m47b564c1ka5449f82050298f@mail.gmail.com>
	 <a9f269830805121315h37f1a903u203f73cdfb07315a@mail.gmail.com>
	 <f489d6e30805121621s94740a8kcc6b0d598463c953@mail.gmail.com>
	 <CBC2FD27-1AFA-4B52-A1E6-F19278C5E257@mcs.anl.gov>
Message-ID: <f489d6e30805130846m20d77635n5af1801214a5b123@mail.gmail.com>

Hello:
Thanks for all of your help, this has helped me tremendously!

Milad

On Mon, May 12, 2008 at 7:22 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>   A couple of items.
>
>     Overlapping communication and computation is pretty much a myth. The CPU
> is used by MPI to pack
>  the messages and put them on the network so it is not available for
> computation during this time. Usually
>  if you try to overlap communication and computation it will end up being
> slower and I've never seen it faster.
>  Vendors will try to trick you into buying a machine by saying it does it,
> but it really doesn't. Just forget about trying to do it.
>
>    Creating a DA involves a good amount of setup and some communication; it
> is fine to use a few DA's
>  but setting up hundreds of DAs is not a good idea UNLESS YOU DO TONS OF
> WORK for each DA.
>  In your case you are doing just a tiny amount of communication  with each
> DA so the DA setup time
>  is dominating.
>
>   If you have hundreds of vectors that you wish to communicate AT THE SAME
> TIME (seems strange but
>  I suppose it is possible), then rather than having hundreds of
> DAGlobalToLocalBegin/End() in a row
>  you will want to create an additional "meta" DA that has the same m,n,p as
> the regular DA but has a
>  dof equal to the number of vectors you wish to communicate at the same
> time. Use VecStrideScatterAll()
>  to get the individual vectors into a meta vector, do the
> DAGlobalToLocalBegin/End() on the meta vector
>  to get the ghost values and then use DAStrideGatherAll() to get the values
> into the 322 individual ghosted
>  vectors. The reason to do it this way is so the values in all the vectors
> are all sent together in a single
>  MPI message instead of the separate message that would needed for each of
> the small
>  DAGlobalToLocalBegin/End().
>
>    Barry
>
>
>
>
>
>  On May 12, 2008, at 6:21 PM, Milad Fatenejad wrote:
>
>
> >
> >
> >
> > Hi:
> > I created a simple test problem that demonstrates the issue. In the
> > test problem, 100 vectors are created using:
> > single.cpp: a single distributed array and
> > multi.cpp: 100 distributed arrays
> >
> > Some math is performed on the vectors, then they are scattered to
> > local vectors..
> >
> > The log summary (running 2 processes) shows that multi.cpp uses more
> > memory and performs more reductions than single.cpp, which is similar
> > to the experience I had with my program...
> >
> > I hope this helps
> > Milad
> >
> > On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >
> > > On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad <icksa1 at gmail.com>
> wrote:
> > >
> > > > Hello:
> > > > I've attached the result of two calculations. The file "log-multi-da"
> > > > uses 1 DA for each vector (322 in all) and the file "log-single-da"
> > > > using 1 DA for the entire calculation. When using 322 DA's, about 10x
> > > > more time is spent in VecScatterBegin and VecScatterEnd. Both were
> > > > running using two processes
> > > >
> > > > I should mention that the source code for these two runs was exactly
> > > > the same, I didn't reorder the scatters differently. The only
> > > > difference was the number of DAs
> > > >
> > > > Any suggestions? Do you think this is related to the number of DA's,
> > > > or something else?
> > > >
> > >
> > > There are vastly different numbers of reductions and much bigger memory
> usage.
> > > Please send the code and I will look at it.
> > >
> > >  Matt
> > >
> > >
> > >
> > >
> > > > Thanks for your help
> > > > Milad
> > > >
> > > > On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > > >
> > > > >
> > > > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad
> <mfatenejad at wisc.edu> wrote:
> > > > >
> > > > > > Hello:
> > > > > > I have two separate DA questions:
> > > > > >
> > > > > > 1) I am writing a large finite difference code and would like to
> be
> > > > > > able to represent an array of vectors. I am currently doing this
> by
> > > > > > creating a single DA and calling DACreateGlobalVector several
> times,
> > > > > > but the manual also states that:
> > > > > >
> > > > > > "PETSc currently provides no container for multiple arrays sharing
> the
> > > > > > same distributed array communication; note, however, that the dof
> > > > > > parameter handles many cases of interest."
> > > > > >
> > > > > > I also found the following mailing list thread which describes how
> to
> > > > > > use the dof parameter to represent several vectors:
> > > > > >
> > > > > >
> > > > > >
> http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
> > > > > >
> > > > > > Where the following solution is proposed:
> > > > > > """
> > > > > > The easiest thing to do in C is to declare a struct:
> > > > > >
> > > > > > typedef struct {
> > > > > >  PetscScalar v[3];
> > > > > >  PetscScalar p;
> > > > > > } Space;
> > > > > >
> > > > > > and then cast pointers
> > > > > >
> > > > > >  Space ***array;
> > > > > >
> > > > > >  DAVecGetArray(da, u, (void *) &array);
> > > > > >
> > > > > >    array[k][j][i].v *= -1.0;
> > > > > > """
> > > > > >
> > > > > > The problem with the proposed solution, is that they use a struct
> to
> > > > > > get the individual values, but what if you don't know the number
> of
> > > > > > degrees of freedom at compile time?
> > > > > >
> > > > >
> > > > > It would be nice to get variable structs in C. However, you can just
> deference
> > > > > the object directly. For example, for 50 degrees of freedom, you can
> do
> > > > >
> > > > >  array[k][j][i][47] *= -1.0;
> > > > >
> > > > >
> > > > >
> > > > > > So my question is two fold:
> > > > > > a) Is there a problem with just having a single DA and calling
> > > > > > DACreateGlobalVector multiple times? Does this affect performance
> at
> > > > > > all (I have many different vectors)?
> > > > > >
> > > > >
> > > > > These are all independent objects. Thus, by itself, creating any
> number of
> > > > > Vecs does nothing to performance (unless you start to run out of
> memory).
> > > > >
> > > > >
> > > > >
> > > > > > b) Is there a way to use the dof parameter when creating a DA when
> the
> > > > > > number of degrees of freedom is not known at compile time?
> > > > > > Specifically, I would like to be able to access the individual
> values
> > > > > > of the vector, just like the example shows...
> > > > > >
> > > > >
> > > > >
> > > > > see above.
> > > > >
> > > > >
> > > > > > 2) The code I am writing has a lot of different parts which
> present a
> > > > > > lot of opportunities to overlap communication an computation when
> > > > > > scattering vectors to update values in the ghost points. Right
> now,
> > > > > > all of my vectors (there are ~50 of them) share a single DA
> because
> > > > > > they all have the same shape. However, by sharing a single DA, I
> can
> > > > > > only scatter one vector at a time. It would be nice to be able to
> > > > > > start scattering each vector right after I'm done computing it,
> and
> > > > > > finish scattering it right before I need it again but I can't
> because
> > > > > > other vectors might need to be scattered in between. I then
> re-wrote
> > > > > > part of my code so that each vector had its own DA object, but
> this
> > > > > > ended up being incredibly slow (I assume this is because I have so
> > > > > > many vectors).
> > > > > >
> > > > >
> > > > > The problem here is that buffering will have to be done for each
> outstanding
> > > > > scatter. Thus I see two resolutions:
> > > > >
> > > > >  1) Duplicate the DA scatter for as many Vecs as you wish to scatter
> at once.
> > > > >     This is essentially what you accomplish with separate DAs.
> > > > >
> > > > >  2) You the dof method. However, this scatter ALL the vectors every
> time.
> > > > >
> > > > > I do not understand what performance problem you would have with
> multiple
> > > > > DAs. With any performance questions, we suggest sending the output
> of
> > > > > -log_summary so we have data to look at.
> > > > >
> > > > >  Matt
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > My question is, is there a way to scatter multiple vectors
> > > > > > simultaneously without affecting the performance of the code? Does
> it
> > > > > > make sense to do this?
> > > > > >
> > > > > >
> > > > > > I'd really appreciate any help...
> > > > > >
> > > > > > Thanks
> > > > > > Milad Fatenejad
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > What most experimenters take for granted before they begin their
> > > > > experiments is infinitely more interesting than any results to which
> > > > > their experiments lead.
> > > > > -- Norbert Wiener
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> > > their experiments lead.
> > > -- Norbert Wiener
> > >
> > >
> > >
> > <log-multi><log-single><multi.cpp><single.cpp>
> >
>
>


From balay at mcs.anl.gov  Tue May 13 10:55:26 2008
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 13 May 2008 10:55:26 -0500 (CDT)
Subject: BOUNCE petsc-users@mcs.anl.gov:    Non-member submission from ["Lars
 Rindorf" <Lars.Rindorf@teknologisk.dk>] (fwd)
Message-ID: <alpine.LFD.1.10.0805131053150.20993@asterix>


From Lars.Rindorf at teknologisk.dk  Tue May 13 08:49:19 2008
From: Lars.Rindorf at teknologisk.dk (Lars Rindorf)
Date: Tue, 13 May 2008 15:49:19 +0200
Subject: Parallel petsc with external UMFPACK
Message-ID: <B7798776008DFD4B886C93DA123EA89C09F1EC@EXCHCLUS.localdom.net>

Hi 
 
I'm thinking about using petsc to solve a linear system (Ax=b) using
parallelization on a couple of linux computers. It is very important for
my system (electromagnetics) to use a direct solver, such as UMFPACK.
Iterative solvers perform very poorly. 
 
My question is:  I can see that UMFPACK is not listed
(http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertab
le.html) on the petsc page. Are there any plans to expand petsc to also
include petsc?
 
The UMFPACK homepage says that there exist a parallization for UMFPACK
by Steve Hadfield. Can his parallelization be used with petsc?
 
I must add, that although I like numerics and maths I do not intend to
program 'from scratch'. 
 
Kind regards
Lars Rindorf
 

_____________________________


Lars Rindorf
M.Sc., Ph.D.  

http://www.dti.dk <http://www.teknologisk.dk/> 

Danish Technological Institute
Gregersensvej

2630 Taastrup

Denmark
Phone +45 72 20 20 00

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080513/f221b930/attachment.htm>

From balay at mcs.anl.gov  Tue May 13 11:06:50 2008
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 13 May 2008 11:06:50 -0500 (CDT)
Subject: Subject: Parallel petsc with external UMFPACK
In-Reply-To: <alpine.LFD.1.10.0805131053150.20993@asterix>
References: <alpine.LFD.1.10.0805131053150.20993@asterix>
Message-ID: <alpine.LFD.1.10.0805131101270.20993@asterix>

> From: "Lars Rindorf" <Lars.Rindorf at teknologisk.dk>
> To: <petsc-users at mcs.anl.gov>

> I'm thinking about using petsc to solve a linear system (Ax=b) using
> parallelization on a couple of linux computers. It is very important for
> my system (electromagnetics) to use a direct solver, such as UMFPACK.
> Iterative solvers perform very poorly.
> 
> My question is:  I can see that UMFPACK is not listed
> (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertab
> le.html) on the petsc page. Are there any plans to expand petsc to also
> include petsc?

Its listed there. However it says parallel/complex support is not
available.

However - if you need a parallel direct solver - you might explore
MUMPS. The other alternatives are SuperLU_DIST, spooles.

Satish

> The UMFPACK homepage says that there exist a parallization for UMFPACK
> by Steve Hadfield. Can his parallelization be used with petsc?
> 
> I must add, that although I like numerics and maths I do not intend to
> program 'from scratch'.
> 
> Kind regards
> Lars Rindorf


From Lars.Rindorf at teknologisk.dk  Tue May 13 11:40:03 2008
From: Lars.Rindorf at teknologisk.dk (Lars Rindorf)
Date: Tue, 13 May 2008 18:40:03 +0200
Subject: SV: Subject: Parallel petsc with external UMFPACK
In-Reply-To: <alpine.LFD.1.10.0805131101270.20993@asterix>
Message-ID: <B7798776008DFD4B886C93DA123EA89C09F1EE@EXCHCLUS.localdom.net>

Hi Satish

I'm sorry. My text should have read, that UMFPACK is listed as not available in parallel. My fault.

MUMPS sounds very interesting being a multifrontal solver. I'll try it out. Thanks!

KR, Lars

-----Oprindelig meddelelse-----
Fra: owner-petsc-users at mcs.anl.gov [mailto:owner-petsc-users at mcs.anl.gov] P? vegne af Satish Balay
Sendt: 13. maj 2008 18:07
Til: petsc-users at mcs.anl.gov
Emne: Re: Subject: Parallel petsc with external UMFPACK

> From: "Lars Rindorf" <Lars.Rindorf at teknologisk.dk>
> To: <petsc-users at mcs.anl.gov>

> I'm thinking about using petsc to solve a linear system (Ax=b) using 
> parallelization on a couple of linux computers. It is very important 
> for my system (electromagnetics) to use a direct solver, such as UMFPACK.
> Iterative solvers perform very poorly.
> 
> My question is:  I can see that UMFPACK is not listed 
> (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvert
> ab
> le.html) on the petsc page. Are there any plans to expand petsc to 
> also include petsc?

Its listed there. However it says parallel/complex support is not available.

However - if you need a parallel direct solver - you might explore MUMPS. The other alternatives are SuperLU_DIST, spooles.

Satish

> The UMFPACK homepage says that there exist a parallization for UMFPACK 
> by Steve Hadfield. Can his parallelization be used with petsc?
> 
> I must add, that although I like numerics and maths I do not intend to 
> program 'from scratch'.
> 
> Kind regards
> Lars Rindorf


From dalcinl at gmail.com  Tue May 13 13:04:07 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 13 May 2008 15:04:07 -0300
Subject: PETSc and parallel direct solvers
In-Reply-To: <B7798776008DFD4B886C93DA123EA89C09F1EF@EXCHCLUS.localdom.net>
References: <e7ba66e40805130954h456c91f3l8566c1b9fec4d91e@mail.gmail.com>
	 <B7798776008DFD4B886C93DA123EA89C09F1EF@EXCHCLUS.localdom.net>
Message-ID: <e7ba66e40805131104v4e75ff9dof17f5fda35dc4f91@mail.gmail.com>

On 5/13/08, Lars Rindorf <Lars.Rindorf at teknologisk.dk> wrote:
> Dear Lisandro
>
>  I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance?

Perhaps I'm missing something, but if you are using petsc-2.3.3 or
below (in petsc-dev there is now a MatSetSoverType, I have not found
the time to look at it), the if you do not convert the matrix to
'aijmumps' format, then I guess PETSc ended up using the default,
PETSc-builting LU factorization, and not MUMPS at all !!...

To be completelly sure about what your program is acutally using, add
'-ksp_view' to the command line. Then you easily notice if you are
using MUMPS or not.

Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it
is actually faster or slower than MUMPS. But I want to make sure you
are actually trying MUMPS.  As you can see, selection LU solver in
PETSc was a bit contrived, that's the reason Barry Smith reimplemented
all this crap adding the MatSetSolverType() stuff.

I'm posting this to petsc-users, please any PETSc developer/user
correct me if I'm wrong in any of my above coments. I'm do not
frequently use direct methods.


Regards,


>
>  Kind regards, Lars
>
>  -----Oprindelig meddelelse-----
>  Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com]
>  Sendt: 13. maj 2008 18:54
>  Til: Lars Rindorf
>  Emne: PETSc and parallel direct solvers
>
>
>  Dear lars, I saw you post to petsc-users, it bounced because you have to suscribe to the list
>
>  I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you?
>
>  I usually do this to easy switch to use mumps. First, in the source code, after assembling your matrix, add the following
>
>  MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A);
>
>  And then, when you actually run your program, add the following to the command line:
>
>  $ mpiexec -n <np> ./yourprogram  -matconvert_type aijmumps -ksp_type preonly -pc_type lu
>
>  This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two.
>
>
>  --
>  Lisandro Dalc?n
>  ---------------
>  Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina
>  Tel/Fax: +54-(0)342-451.1594
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From Lars.Rindorf at teknologisk.dk  Tue May 13 13:37:19 2008
From: Lars.Rindorf at teknologisk.dk (Lars Rindorf)
Date: Tue, 13 May 2008 20:37:19 +0200
Subject: SV: PETSc and parallel direct solvers
In-Reply-To: <e7ba66e40805131104v4e75ff9dof17f5fda35dc4f91@mail.gmail.com>
Message-ID: <B7798776008DFD4B886C93DA123EA89C09F1F0@EXCHCLUS.localdom.net>

Dear Lisandro

I was also suspecting that petsc was using the default lu factorization. And, in fact, petsc returns 'type: lu' instead of 'type mumps'. So you are right. I will try again later with a linux computer to compare umfpack and mumps. 

In the comparison between umfpack and mumps you send me (http://istanbul.be.itu.edu.tr/~huseyin/doc/frontal/node12.html) umfpack and mumps are almost equal in performance (they spell it 'mups'. Their reference on 'mups' is from 1989, maybe mups is a predecessor of mumps). If they are almost equal, then mumps is good enough for my purposes.

Thanks. KR, Lars
 

-----Oprindelig meddelelse-----
Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com] 
Sendt: 13. maj 2008 20:04
Til: Lars Rindorf
Cc: petsc-users at mcs.anl.gov
Emne: Re: PETSc and parallel direct solvers

On 5/13/08, Lars Rindorf <Lars.Rindorf at teknologisk.dk> wrote:
> Dear Lisandro
>
>  I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance?

Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!...

To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not.

Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS.  As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff.

I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods.


Regards,


>
>  Kind regards, Lars
>
>  -----Oprindelig meddelelse-----
>  Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com]
>  Sendt: 13. maj 2008 18:54
>  Til: Lars Rindorf
>  Emne: PETSc and parallel direct solvers
>
>
>  Dear lars, I saw you post to petsc-users, it bounced because you have 
> to suscribe to the list
>
>  I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you?
>
>  I usually do this to easy switch to use mumps. First, in the source 
> code, after assembling your matrix, add the following
>
>  MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A);
>
>  And then, when you actually run your program, add the following to the command line:
>
>  $ mpiexec -n <np> ./yourprogram  -matconvert_type aijmumps -ksp_type 
> preonly -pc_type lu
>
>  This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two.
>
>
>  --
>  Lisandro Dalc?n
>  ---------------
>  Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) 
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) 
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) 
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
>  Tel/Fax: +54-(0)342-451.1594
>


--
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From dalcinl at gmail.com  Tue May 13 18:54:07 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 13 May 2008 20:54:07 -0300
Subject: PETSc and parallel direct solvers
In-Reply-To: <B7798776008DFD4B886C93DA123EA89C09F1F0@EXCHCLUS.localdom.net>
References: <e7ba66e40805131104v4e75ff9dof17f5fda35dc4f91@mail.gmail.com>
	 <B7798776008DFD4B886C93DA123EA89C09F1F0@EXCHCLUS.localdom.net>
Message-ID: <e7ba66e40805131654o4e1a4188xe11b1507e2e6e728@mail.gmail.com>

On 5/13/08, Lars Rindorf <Lars.Rindorf at teknologisk.dk> wrote:
> Dear Lisandro
>
>  I was also suspecting that petsc was using the default lu factorization. And, in fact, petsc returns 'type: lu' instead of 'type mumps'. So you are right. I will try again later with a linux computer to compare umfpack and mumps.

Indeed. Tell me your conclusions, I would love to know your results...

>  In the comparison between umfpack and mumps you send me (http://istanbul.be.itu.edu.tr/~huseyin/doc/frontal/node12.html) umfpack and mumps are almost equal in performance (they spell it 'mups'. Their reference on 'mups' is from 1989, maybe mups is a predecessor of mumps). If they are almost equal, then mumps is good enough for my purposes.
>

Well, the first authon in the 1989 reference seems to be the same that
Patrick Amestoy   here
http://graal.ens-lyon.fr/MUMPS/index.php?page=credits. As i warned,
the link is dated. Better to give a try yourself!..

Regards

>  Thanks. KR, Lars
>
>
>
>  -----Oprindelig meddelelse-----
>  Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com]
>
> Sendt: 13. maj 2008 20:04
>  Til: Lars Rindorf
>  Cc: petsc-users at mcs.anl.gov
>  Emne: Re: PETSc and parallel direct solvers
>
>
>  On 5/13/08, Lars Rindorf <Lars.Rindorf at teknologisk.dk> wrote:
>  > Dear Lisandro
>  >
>  >  I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance?
>
>  Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!...
>
>  To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not.
>
>  Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS.  As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff.
>
>  I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods.
>
>
>  Regards,
>
>
>  >
>  >  Kind regards, Lars
>  >
>  >  -----Oprindelig meddelelse-----
>  >  Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com]
>  >  Sendt: 13. maj 2008 18:54
>  >  Til: Lars Rindorf
>  >  Emne: PETSc and parallel direct solvers
>  >
>  >
>  >  Dear lars, I saw you post to petsc-users, it bounced because you have
>  > to suscribe to the list
>  >
>  >  I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you?
>  >
>  >  I usually do this to easy switch to use mumps. First, in the source
>  > code, after assembling your matrix, add the following
>  >
>  >  MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A);
>  >
>  >  And then, when you actually run your program, add the following to the command line:
>  >
>  >  $ mpiexec -n <np> ./yourprogram  -matconvert_type aijmumps -ksp_type
>  > preonly -pc_type lu
>  >
>  >  This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two.
>  >
>  >
>  >  --
>  >  Lisandro Dalc?n
>  >  ---------------
>  >  Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
>  > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
>  > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
>  > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
>  >  Tel/Fax: +54-(0)342-451.1594
>  >
>
>
>  --
>  Lisandro Dalc?n
>  ---------------
>  Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina
>  Tel/Fax: +54-(0)342-451.1594
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From Lars.Rindorf at teknologisk.dk  Tue May 13 08:49:19 2008
From: Lars.Rindorf at teknologisk.dk (Lars Rindorf)
Date: Tue, 13 May 2008 15:49:19 +0200
Subject: Parallel petsc with external UMFPACK
Message-ID: <B7798776008DFD4B886C93DA123EA89C09F1EC@EXCHCLUS.localdom.net>

Hi 
 
I'm thinking about using petsc to solve a linear system (Ax=b) using
parallelization on a couple of linux computers. It is very important for
my system (electromagnetics) to use a direct solver, such as UMFPACK.
Iterative solvers perform very poorly. 
 
My question is:  I can see that UMFPACK is not listed
(http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertab
le.html) on the petsc page. Are there any plans to expand petsc to also
include petsc?
 
The UMFPACK homepage says that there exist a parallization for UMFPACK
by Steve Hadfield. Can his parallelization be used with petsc?
 
I must add, that although I like numerics and maths I do not intend to
program 'from scratch'. 
 
Kind regards
Lars Rindorf
 

_____________________________


Lars Rindorf
M.Sc., Ph.D.  

http://www.dti.dk <http://www.teknologisk.dk/> 

Danish Technological Institute
Gregersensvej

2630 Taastrup

Denmark
Phone +45 72 20 20 00

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080513/f221b930/attachment-0001.htm>

From mbostandoust at yahoo.com  Tue May 13 23:15:47 2008
From: mbostandoust at yahoo.com (Mehdi Bostandoost)
Date: Tue, 13 May 2008 21:15:47 -0700 (PDT)
Subject: PETSc and parallel direct solvers
In-Reply-To: <e7ba66e40805131654o4e1a4188xe11b1507e2e6e728@mail.gmail.com>
Message-ID: <527863.5972.qm@web33507.mail.mud.yahoo.com>

Hi
  In my master thesis,I needed to use PETSC direct solvers. Because of that I preparesd a short report. I attached the report to this email.
   
  note: the cluster that we had was not a good cluster and this report goes back to 4 years ago when I used petsc2.1.6.
   
  I thought it might be helpful.
   
  Regards
   
  Mehdi

Lisandro Dalcin <dalcinl at gmail.com> wrote:
  On 5/13/08, Lars Rindorf wrote:
> Dear Lisandro
>
> I was also suspecting that petsc was using the default lu factorization. And, in fact, petsc returns 'type: lu' instead of 'type mumps'. So you are right. I will try again later with a linux computer to compare umfpack and mumps.

Indeed. Tell me your conclusions, I would love to know your results...

> In the comparison between umfpack and mumps you send me (http://istanbul.be.itu.edu.tr/~huseyin/doc/frontal/node12.html) umfpack and mumps are almost equal in performance (they spell it 'mups'. Their reference on 'mups' is from 1989, maybe mups is a predecessor of mumps). If they are almost equal, then mumps is good enough for my purposes.
>

Well, the first authon in the 1989 reference seems to be the same that
Patrick Amestoy here
http://graal.ens-lyon.fr/MUMPS/index.php?page=credits. As i warned,
the link is dated. Better to give a try yourself!..

Regards

> Thanks. KR, Lars
>
>
>
> -----Oprindelig meddelelse-----
> Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com]
>
> Sendt: 13. maj 2008 20:04
> Til: Lars Rindorf
> Cc: petsc-users at mcs.anl.gov
> Emne: Re: PETSc and parallel direct solvers
>
>
> On 5/13/08, Lars Rindorf wrote:
> > Dear Lisandro
> >
> > I have tried to compare MUMPS with UMFPACK for one of my systems. UMFPACK is four times faster (134s) than MUMPS (581s). I did not add the 'MatConvert' line in my program. I've given up on cygwin, and I will receive a linux computer later in this week. Then I will try it again. Do you think that the missing 'MatConvert' line could cause the long calculation time? Or, rather, would including the missing line give a four fold enhancement of the MUMPS performance?
>
> Perhaps I'm missing something, but if you are using petsc-2.3.3 or below (in petsc-dev there is now a MatSetSoverType, I have not found the time to look at it), the if you do not convert the matrix to 'aijmumps' format, then I guess PETSc ended up using the default, PETSc-builting LU factorization, and not MUMPS at all !!...
>
> To be completelly sure about what your program is acutally using, add '-ksp_view' to the command line. Then you easily notice if you are using MUMPS or not.
>
> Finally, a disclaimer. I never tried UMFPACK, so I have no idea if it is actually faster or slower than MUMPS. But I want to make sure you are actually trying MUMPS. As you can see, selection LU solver in PETSc was a bit contrived, that's the reason Barry Smith reimplemented all this crap adding the MatSetSolverType() stuff.
>
> I'm posting this to petsc-users, please any PETSc developer/user correct me if I'm wrong in any of my above coments. I'm do not frequently use direct methods.
>
>
> Regards,
>
>
> >
> > Kind regards, Lars
> >
> > -----Oprindelig meddelelse-----
> > Fra: Lisandro Dalcin [mailto:dalcinl at gmail.com]
> > Sendt: 13. maj 2008 18:54
> > Til: Lars Rindorf
> > Emne: PETSc and parallel direct solvers
> >
> >
> > Dear lars, I saw you post to petsc-users, it bounced because you have
> > to suscribe to the list
> >
> > I never used UMFPACK, but I've tried MUMPS with PETSc, and it seems to work just fine. Could you give a try to see if it works for you?
> >
> > I usually do this to easy switch to use mumps. First, in the source
> > code, after assembling your matrix, add the following
> >
> > MatConvert(A, MATSAME, MAT_REUSE_MATRIX, &A);
> >
> > And then, when you actually run your program, add the following to the command line:
> >
> > $ mpiexec -n ./yourprogram -matconvert_type aijmumps -ksp_type
> > preonly -pc_type lu
> >
> > This way, you will actually use MUMPS if you pass the '-matconvert_type aijmumps' option. If you run sequentially and do not pass the matconvert option, then petsc will use their default LU factorization. Of course, you can also use MUMPS sequentially depending on your hardware and compiler optimizations MUMPS can be faster than PETSc-builtin linear solvers by a factor of two.
> >
> >
> > --
> > Lisandro Dalc?n
> > ---------------
> > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
>
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080513/122dfb1b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Performance of PETSC direct solvers on the Beowulf Cluster.pdf
Type: application/pdf
Size: 127056 bytes
Desc: 4129988398-Performance of PETSC direct solvers on the Beowulf Cluster.pdf
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080513/122dfb1b/attachment.pdf>

From knepley at gmail.com  Wed May 14 06:58:42 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 14 May 2008 06:58:42 -0500
Subject: Parallel petsc with external UMFPACK
In-Reply-To: <B7798776008DFD4B886C93DA123EA89C09F1EC@EXCHCLUS.localdom.net>
References: <B7798776008DFD4B886C93DA123EA89C09F1EC@EXCHCLUS.localdom.net>
Message-ID: <a9f269830805140458s2f26a98bn5bba2be0e955f8c0@mail.gmail.com>

On Tue, May 13, 2008 at 8:49 AM, Lars Rindorf
<Lars.Rindorf at teknologisk.dk> wrote:
>
>
> Hi
>
> I'm thinking about using petsc to solve a linear system (Ax=b) using
> parallelization on a couple of linux computers. It is very important for my
> system (electromagnetics) to use a direct solver, such as UMFPACK. Iterative
> solvers perform very poorly.
>
> My question is:  I can see that UMFPACK is not listed
> (http://www-unix.mcs.anl.gov/petsc/petsc-2/documentation/linearsolvertable.html)
> on the petsc page. Are there any plans to expand petsc to also include
> petsc?
>
> The UMFPACK homepage says that there exist a parallization for UMFPACK by
> Steve Hadfield. Can his parallelization be used with petsc?

If you can find it, please send the URL. I looked this morning and
could not locate
this parallel UMFPACK on the web, which makes me very suspicious. We
only like to
wrap supported software, not one-off projects that continually break because no
one is maintaining them.

    Matt

> I must add, that although I like numerics and maths I do not intend to
> program 'from scratch'.
>
> Kind regards
> Lars Rindorf
>
>
>
> _____________________________
>
>
> Lars Rindorf
> M.Sc., Ph.D.
>
> http://www.dti.dk
>
> Danish Technological Institute
> Gregersensvej
>
> 2630 Taastrup
>
> Denmark
> Phone +45 72 20 20 00
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From tsjb00 at hotmail.com  Wed May 14 18:16:21 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Wed, 14 May 2008 23:16:21 +0000
Subject: question about VecLoad
Message-ID: <BAY110-W146C8304F5EE1378FB986DC0CE0@phx.gbl>


Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as:
  for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/
	fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var);
	i++;
	idx=i;
	AOApplicationToPetsc(ao,1,&idx);
	VecSetValue(v0,idx,var,INSERT_VALUES);
      }
 then use VecView to output the binary file:
      PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer);
      VecView(v0,viewer);

It's wrong. Please let me know how to fix it.

Many thanks!

JB
_________________________________________________________________
?????????live mail????????
http://get.live.cn/product/mail.html


From tsjb00 at hotmail.com  Wed May 14 18:39:42 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Wed, 14 May 2008 23:39:42 +0000
Subject: question about VecLoad (pls disregard previous one)
Message-ID: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>


Sorry the previous message is wrong

Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as:
  for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/
	fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var);
	i++;
	idx=i;
	AOApplicationToPetsc(ao,1,&idx);
	VecSetValue(v0,idx,var,INSERT_VALUES);
      }
 then use VecView to output the binary file:
      PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer);
      VecView(v0,viewer);

Please let me know if something is wrong.

Many thanks!

JB

_________________________________________________________________
MSN ????????????????????
http://cn.msn.com


From tsjb00 at hotmail.com  Wed May 14 19:46:38 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Thu, 15 May 2008 00:46:38 +0000
Subject: question about VecLoad (attached )
Message-ID: <BAY110-W2D5C8DCB88249C4A53164C0C90@phx.gbl>


Sorry the message gets messed up again! Please check the attached text file of my questions.

Sorry for the inconvenience! Many thanks in advance!

JB
----------------------------------------
> From: tsjb00 at hotmail.com
> To: petsc-users at mcs.anl.gov
> Subject: question about VecLoad (pls disregard previous one)
> Date: Wed, 14 May 2008 23:39:42 +0000
>
>
> Sorry the previous message is wrong
>
> Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as:
>   for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/
>       fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var);
>       i++;
>       idx=i;
>       AOApplicationToPetsc(ao,1,&idx);
>       VecSetValue(v0,idx,var,INSERT_VALUES);
>       }
>  then use VecView to output the binary file:
>       PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer);
>       VecView(v0,viewer);
>
> Please let me know if something is wrong.
>
> Many thanks!
>
> JB
>
> _________________________________________________________________
> MSN ????????????????????
> http://cn.msn.com
>

_________________________________________________________________
?????????????MSN????TA?????
http://im.live.cn/emoticons/?ID=18
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: question.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080515/e79868e8/attachment.txt>

From bsmith at mcs.anl.gov  Wed May 14 20:28:51 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 14 May 2008 20:28:51 -0500
Subject: question about VecLoad (pls disregard previous one)
In-Reply-To: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>
References: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>
Message-ID: <CDE5C196-41F9-496C-B9EC-7D216949ECCE@mcs.anl.gov>


   Since PETSc is used from C or Fortran you are free to write any  
kind of code you want
that reads in ASCII files anyway you want. As you have done it is good  
to save the vectors
with a binary viewer because they are easy to read and write; but  
again you can write
whatever code you want.

    As to whether you code is wrong, no one can say, just run it and  
test it.

    Barry

On May 14, 2008, at 6:39 PM, tsjb00 wrote:

>
> Sorry the previous message is wrong
>
> Hi! I have a question about VecLoad. In my program, I need to read  
> in ordered data from an input file, which in an ordinary c program  
> would be as:
>  for (k=0; ky->z,ie  
> p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/
> 	fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var);
> 	i++;
> 	idx=i;
> 	AOApplicationToPetsc(ao,1,&idx);
> 	VecSetValue(v0,idx,var,INSERT_VALUES);
>      }
> then use VecView to output the binary file:
>       
> PetscViewerBinaryOpen 
> (PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer);
>      VecView(v0,viewer);
>
> Please let me know if something is wrong.
>
> Many thanks!
>
> JB
>
> _________________________________________________________________
> MSN ????????????????????
> http://cn.msn.com
>
>


From rafaelsantoscoelho at gmail.com  Wed May 14 22:30:01 2008
From: rafaelsantoscoelho at gmail.com (Rafael Santos Coelho)
Date: Thu, 15 May 2008 00:30:01 -0300
Subject: Something weird with SNES convergence reason
Message-ID: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>

Hello everybody,

I've coded a program which solves, in parallel, the three-dimensional Bratu
problem. Afterwards, I've run tests in a cluster to see how it would go and,
at first, it seemed ok to me, but then I've noticed that whenever I
increased the number of processors (from 16 to 32, for example), the program
started to diverge due to a failure in the Line Search Newton's Method. Here
is what a monitoring function prints out:

nonlinear iteration number  = 1, norm(F(x)) = 1013.53, linear iterations =
16
nonlinear iteration number  = 2, norm(F(x)) = 1013.33, linear iterations =
32
nonlinear iteration number  = 3, norm(F(x)) = 1013.33, linear iterations =
48
Nonlinear solve did not converge due to DIVERGED_LS_FAILURE

Indeed, one can see that the method is really diverging (for smaller tests,
though, say N = 8 * 8 * 8, it converges).

What's wrong here? Is it something with my code? If yes, how can I fix it?

Best regards,

Rafael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080515/6faae4f4/attachment.htm>

From knepley at gmail.com  Thu May 15 07:07:46 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 15 May 2008 07:07:46 -0500
Subject: question about VecLoad (attached )
In-Reply-To: <BAY110-W2D5C8DCB88249C4A53164C0C90@phx.gbl>
References: <BAY110-W2D5C8DCB88249C4A53164C0C90@phx.gbl>
Message-ID: <a9f269830805150507hefe6c06te7a27d4be01db998@mail.gmail.com>

You do not have to use the AO. We always save to file in the natural ordering.
Without this, we would not be able to load on different numbers of processors.

  Matt

2008/5/14 tsjb00 <tsjb00 at hotmail.com>:
>
> Sorry the message gets messed up again! Please check the attached text file of my questions.
>
> Sorry for the inconvenience! Many thanks in advance!
>
> JB
> ----------------------------------------
>> From: tsjb00 at hotmail.com
>> To: petsc-users at mcs.anl.gov
>> Subject: question about VecLoad (pls disregard previous one)
>> Date: Wed, 14 May 2008 23:39:42 +0000
>>
>>
>> Sorry the previous message is wrong
>>
>> Hi! I have a question about VecLoad. In my program, I need to read in ordered data from an input file, which in an ordinary c program would be as:
>>   for (k=0; ky->z,ie p(x0,y0,z0),p(x1,y0,z0),...,p(x0,y1,z0),p(x1,y1,z0),...*/
>>       fscanf(fp,"%f %f %f %f\n",&dumx,&dumy,&dumz,&var);
>>       i++;
>>       idx=i;
>>       AOApplicationToPetsc(ao,1,&idx);
>>       VecSetValue(v0,idx,var,INSERT_VALUES);
>>       }
>>  then use VecView to output the binary file:
>>       PetscViewerBinaryOpen(PETSC_COMM_SELF,"out.dat",FILE_MODE_WRITE,&viewer);
>>       VecView(v0,viewer);
>>
>> Please let me know if something is wrong.
>>
>> Many thanks!
>>
>> JB
>>
>> _________________________________________________________________
>> MSN ????????????????????
>> http://cn.msn.com
>>
>
> _________________________________________________________________
> ?????????????MSN????TA?????
> http://im.live.cn/emoticons/?ID=18


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From knepley at gmail.com  Thu May 15 07:28:40 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 15 May 2008 07:28:40 -0500
Subject: Something weird with SNES convergence reason
In-Reply-To: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>
References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>
Message-ID: <a9f269830805150528oef9bba9g9fca3a500c472a14@mail.gmail.com>

1) Are the linear systems really being solved in Newton?

2) What is the Bratu parameter? Turn it off and see that you
     get convergence in 1 iteration.

   Matt

On Wed, May 14, 2008 at 10:30 PM, Rafael Santos Coelho
<rafaelsantoscoelho at gmail.com> wrote:
> Hello everybody,
>
> I've coded a program which solves, in parallel, the three-dimensional Bratu
> problem. Afterwards, I've run tests in a cluster to see how it would go and,
> at first, it seemed ok to me, but then I've noticed that whenever I
> increased the number of processors (from 16 to 32, for example), the program
> started to diverge due to a failure in the Line Search Newton's Method. Here
> is what a monitoring function prints out:
>
> nonlinear iteration number  = 1, norm(F(x)) = 1013.53, linear iterations =
> 16
> nonlinear iteration number  = 2, norm(F(x)) = 1013.33, linear iterations =
> 32
> nonlinear iteration number  = 3, norm(F(x)) = 1013.33, linear iterations =
> 48
> Nonlinear solve did not converge due to DIVERGED_LS_FAILURE
>
> Indeed, one can see that the method is really diverging (for smaller tests,
> though, say N = 8 * 8 * 8, it converges).
>
> What's wrong here? Is it something with my code? If yes, how can I fix it?
>
> Best regards,
>
> Rafael
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From bsmith at mcs.anl.gov  Thu May 15 09:04:52 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 15 May 2008 09:04:52 -0500
Subject: Something weird with SNES convergence reason
In-Reply-To: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>
References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>
Message-ID: <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov>


   run with -ksp_monitor -ksp_converged_reason to see how the linear  
solver is working. Also try adding
-ksp_rtol 1.e-10 to see if solving the linear system more accurately  
helps (it really shouldn't matter).
You can also run with -info to get a lot more detailed information  
about the nonlinear solve and the
line search. This is suppose to be an easy nonlinear problem so I  
would expect it to converge easily,
there may be a slight error with your FormFunction() that starts to  
matter only for larger problems.

    Barry


On May 14, 2008, at 10:30 PM, Rafael Santos Coelho wrote:

> Hello everybody,
>
> I've coded a program which solves, in parallel, the three- 
> dimensional Bratu problem. Afterwards, I've run tests in a cluster  
> to see how it would go and, at first, it seemed ok to me, but then  
> I've noticed that whenever I increased the number of processors  
> (from 16 to 32, for example), the program started to diverge due to  
> a failure in the Line Search Newton's Method. Here is what a  
> monitoring function prints out:
>
> nonlinear iteration number  = 1, norm(F(x)) = 1013.53, linear  
> iterations = 16
> nonlinear iteration number  = 2, norm(F(x)) = 1013.33, linear  
> iterations = 32
> nonlinear iteration number  = 3, norm(F(x)) = 1013.33, linear  
> iterations = 48
> Nonlinear solve did not converge due to DIVERGED_LS_FAILURE
>
> Indeed, one can see that the method is really diverging (for smaller  
> tests, though, say N = 8 * 8 * 8, it converges).
>
> What's wrong here? Is it something with my code? If yes, how can I  
> fix it?
>
> Best regards,
>
> Rafael
>


From tsjb00 at hotmail.com  Thu May 15 10:29:03 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Thu, 15 May 2008 15:29:03 +0000
Subject: question about VecLoad (pls disregard previous one)
In-Reply-To: <CDE5C196-41F9-496C-B9EC-7D216949ECCE@mcs.anl.gov>
References: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>
 <CDE5C196-41F9-496C-B9EC-7D216949ECCE@mcs.anl.gov>
Message-ID: <BAY110-W49A9C2B12C9637FDA6A365C0C90@phx.gbl>


Many thanks to all for the reply! 

I use the code:
  PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer);
  VecLoad(viewer,VECMPI,&v1);
to get the vector storing values of var.

My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that?

Thanks in advance!

JB

_________________________________________________________________
MSN ????????????????????
http://cn.msn.com


From knepley at gmail.com  Thu May 15 10:45:24 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 15 May 2008 10:45:24 -0500
Subject: question about VecLoad (pls disregard previous one)
In-Reply-To: <BAY110-W49A9C2B12C9637FDA6A365C0C90@phx.gbl>
References: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>
	 <CDE5C196-41F9-496C-B9EC-7D216949ECCE@mcs.anl.gov>
	 <BAY110-W49A9C2B12C9637FDA6A365C0C90@phx.gbl>
Message-ID: <a9f269830805150845kd67ef35jfe3065c7071859de@mail.gmail.com>

2008/5/15 tsjb00 <tsjb00 at hotmail.com>:
>
> Many thanks to all for the reply!
>
> I use the code:
>  PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer);
>  VecLoad(viewer,VECMPI,&v1);
> to get the vector storing values of var.
>
> My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that?

The output vector, using VecView(), is in natural order. It has no
idea about distribution over processes. When it is read in,
using VecLoad(), it is redistributed according to the current partition.

  Matt

> Thanks in advance!
>
> JB
>
> _________________________________________________________________
> MSN ????????????????????
> http://cn.msn.com
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From pflath at ices.utexas.edu  Thu May 15 11:04:28 2008
From: pflath at ices.utexas.edu (Pearl Flath)
Date: Thu, 15 May 2008 11:04:28 -0500
Subject: accessing iterative vectors in CG
Message-ID: <d91228d00805150904r502d252co4676ebe79b819cad@mail.gmail.com>

Hi,
I'd like to do some additional calculations for another purpose with the
iterative vectors in each step of the KSP CG solve. How do I get access to
them?


Pearl Flath
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080515/f4a402bd/attachment.htm>

From knepley at gmail.com  Thu May 15 11:10:42 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 15 May 2008 11:10:42 -0500
Subject: accessing iterative vectors in CG
In-Reply-To: <d91228d00805150904r502d252co4676ebe79b819cad@mail.gmail.com>
References: <d91228d00805150904r502d252co4676ebe79b819cad@mail.gmail.com>
Message-ID: <a9f269830805150910l77b8598bh24e63d9ebc80d6be@mail.gmail.com>

On Thu, May 15, 2008 at 11:04 AM, Pearl Flath <pflath at ices.utexas.edu> wrote:
> Hi,
> I'd like to do some additional calculations for another purpose with the
> iterative vectors in each step of the KSP CG solve. How do I get access to
> them?

We do not expose those vectors to the user in the interface. I think the easiest
thing to do is copy the cg.c code into another solver, mycg.c and do
the requisite
calculations in that one.

You register the solver in the same way as CG is registered in
src/ksp/ksp/interface/itregis.c
with a call to KSPRegisterDynamic(). Then from the command-line you
can use -ksp_type mycg.

   Matt

> Pearl Flath
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From dalcinl at gmail.com  Thu May 15 11:38:19 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Thu, 15 May 2008 13:38:19 -0300
Subject: question about VecLoad (pls disregard previous one)
In-Reply-To: <BAY110-W49A9C2B12C9637FDA6A365C0C90@phx.gbl>
References: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>
	 <CDE5C196-41F9-496C-B9EC-7D216949ECCE@mcs.anl.gov>
	 <BAY110-W49A9C2B12C9637FDA6A365C0C90@phx.gbl>
Message-ID: <e7ba66e40805150938u2cd9ff38pb7ff9d04f9e5340a@mail.gmail.com>

Have you give a look to VecLoadIntoVector() ? For use it, you have to
previously create the vector with the desired distribution (or get it
from DA's), and then the values will be read and used to fill your
vector... Of couse, you have to get shure that the ordering of indices
is the same (and I believe it should be the 'natural' DA ordering),
this is specially important if you run your problem with different
number of processes.

On 5/15/08, tsjb00 <tsjb00 at hotmail.com> wrote:
>
>  Many thanks to all for the reply!
>
>  I use the code:
>   PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer);
>   VecLoad(viewer,VECMPI,&v1);
>  to get the vector storing values of var.
>
>  My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that?
>
>  Thanks in advance!
>
>
>  JB
>
>  _________________________________________________________________
>  MSN ????????????????????
>  http://cn.msn.com
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From bsmith at mcs.anl.gov  Thu May 15 11:43:08 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 15 May 2008 11:43:08 -0500
Subject: accessing iterative vectors in CG
In-Reply-To: <a9f269830805150910l77b8598bh24e63d9ebc80d6be@mail.gmail.com>
References: <d91228d00805150904r502d252co4676ebe79b819cad@mail.gmail.com> <a9f269830805150910l77b8598bh24e63d9ebc80d6be@mail.gmail.com>
Message-ID: <006BE129-08AE-4A54-88E8-253EA69BA658@mcs.anl.gov>


On May 15, 2008, at 11:10 AM, Matthew Knepley wrote:

> On Thu, May 15, 2008 at 11:04 AM, Pearl Flath  
> <pflath at ices.utexas.edu> wrote:
>> Hi,
>> I'd like to do some additional calculations for another purpose  
>> with the
>> iterative vectors in each step of the KSP CG solve. How do I get  
>> access to
>> them?
>
> We do not expose those vectors to the user in the interface. I think  
> the easiest
> thing to do is copy the cg.c code into another solver, mycg.c and do
> the requisite
> calculations in that one.
>
> You register the solver in the same way as CG is registered in
> src/ksp/ksp/interface/itregis.c
> with a call to KSPRegisterDynamic(). Then from the command-line you
> can use -ksp_type mycg.
>
>   Matt
>

    If you merely want to access the current solution, residual at  
each iteration
when the convergence test or monitoring is done you could just provide  
your
own monitor or convergence test.

    Barry

>> Pearl Flath
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>


From tsjb00 at hotmail.com  Thu May 15 12:09:37 2008
From: tsjb00 at hotmail.com (tsjb00)
Date: Thu, 15 May 2008 17:09:37 +0000
Subject: question about VecLoad (pls disregard previous one)
In-Reply-To: <e7ba66e40805150938u2cd9ff38pb7ff9d04f9e5340a@mail.gmail.com>
References: <BAY110-W4841D551EDE833E4F7C0D0C0CE0@phx.gbl>
	 <CDE5C196-41F9-496C-B9EC-7D216949ECCE@mcs.anl.gov>
 	 <BAY110-W49A9C2B12C9637FDA6A365C0C90@phx.gbl> 
 <e7ba66e40805150938u2cd9ff38pb7ff9d04f9e5340a@mail.gmail.com>
Message-ID: <BAY110-W195C7771E83FA317C40A22C0C90@phx.gbl>


Many thanks for the reply! This seems to work. 
----------------------------------------
> Date: Thu, 15 May 2008 13:38:19 -0300
> From: dalcinl at gmail.com
> To: petsc-users at mcs.anl.gov
> Subject: Re: question about VecLoad (pls disregard previous one)
> 
> Have you give a look to VecLoadIntoVector() ? For use it, you have to
> previously create the vector with the desired distribution (or get it
> from DA's), and then the values will be read and used to fill your
> vector... Of couse, you have to get shure that the ordering of indices
> is the same (and I believe it should be the 'natural' DA ordering),
> this is specially important if you run your problem with different
> number of processes.
> 
> On 5/15/08, tsjb00  wrote:
>>
>>  Many thanks to all for the reply!
>>
>>  I use the code:
>>   PetscViewerBinaryOpen(PETSC_COMM_WORLD,"out.dat",FILE_MODE_READ,viewer);
>>   VecLoad(viewer,VECMPI,&v1);
>>  to get the vector storing values of var.
>>
>>  My problem is, it seems that the vector global is evenly distributed over processors; while in my program, a DA is defined and according to DA global vectors are not distributed the same way. Would anybody please tell me how to deal with that?
>>
>>  Thanks in advance!
>>
>>
>>  JB
>>
>>  _________________________________________________________________
>>  MSN ????????????????????
>>  http://cn.msn.com
>>
>>
> 
> 
> -- 
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
> 

_________________________________________________________________
?????????????MSN????TA?????
http://im.live.cn/emoticons/?ID=18


From rafaelsantoscoelho at gmail.com  Thu May 15 21:21:56 2008
From: rafaelsantoscoelho at gmail.com (Rafael Santos Coelho)
Date: Thu, 15 May 2008 23:21:56 -0300
Subject: Something weird with SNES convergence reason
In-Reply-To: <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov>
References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>
	 <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov>
Message-ID: <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com>

Hi people,

thank you very much for the help. I couldn't fix the problem though...

Matthew:

1) I guess so because for the vast majority of the tests carried out, the
method converges and you can actually observe norm(F(x)) decreasing with few
Newton iterations. For example:

$ mpirun -np 2 ./bratu_problem -N 16 -M 16 -P 16 -ksp_converged_reason
-snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor

  0 KSP Residual norm 5.960419091967e-01
  1 KSP Residual norm 1.235318806330e+00
(...)
Linear solve converged due to CONVERGED_RTOL iterations 56
  0 KSP Residual norm 2.990541631546e-02
  1 KSP Residual norm 1.332572441021e-02
(...)
 29 KSP Residual norm 3.225505605549e-07
 30 KSP Residual norm 1.658059885118e-07
Linear solve converged due to CONVERGED_RTOL iterations 30
  0 KSP Residual norm 7.629434752036e-05
  1 KSP Residual norm 1.413056976255e-05
(...)
 21 KSP Residual norm 1.183900079277e-09
 22 KSP Residual norm 6.010910804534e-10

Linear solve converged due to CONVERGED_RTOL iterations 22
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE

2) The governing PDE is  -Laplacian(u) + d * u_x -lambda * exp(u) = 0 and u
= 0 in all domain boundaries. u_x stands for the partial derivative of u
with respect to the x variable, where u = u(x, y, z). For all the tests I
made, d = 16 and lambda = 32. I tried setting d = 0, but the error
continued.

Barry:

Consider

$ mpirun -np 8 ./bratu_problem -x 16 -y 16 -z 16 -ksp_converged_reason
-snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor

Here's the output:

  0 KSP Residual norm 5.960419091967e-01
  1 KSP Residual norm 1.235318806330e+00
(...)
 55 KSP Residual norm 7.533575286046e-06
 56 KSP Residual norm 4.924747432423e-06
Linear solve converged due to CONVERGED_RTOL iterations 56
  0 KSP Residual norm 5.899667305071e-01
  1 KSP Residual norm 1.233037780509e+00
(...)
 56 KSP Residual norm 9.299650766487e-06
 57 KSP Residual norm 5.541388445894e-06
Linear solve converged due to CONVERGED_RTOL iterations 57
  0 KSP Residual norm 5.898541843665e-01
  1 KSP Residual norm 1.230515227262e+00
(...)
 57 KSP Residual norm 6.065473514455e-06
 58 KSP Residual norm 3.255910272791e-06

Linear solve converged due to CONVERGED_RTOL iterations 58
Nonlinear solve did not converge due to DIVERGED_LS_FAILURE

Now, if I use -ksp_rtol 1.e-10, same thing occurs, the only difference is
that the number of linear iterations per nonlinear iteration gets bigger (as
one might have expected).

I'm using the classic 7-point stencil finite difference approximation to
discretize the PDE...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080515/a8a13555/attachment.htm>

From bsmith at mcs.anl.gov  Fri May 16 07:46:47 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 16 May 2008 07:46:47 -0500
Subject: Something weird with SNES convergence reason
In-Reply-To: <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com>
References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com> <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov> <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com>
Message-ID: <D0D89BC2-316A-4C50-B303-5D033E6CD840@mcs.anl.gov>


DIVERGED_LS_FAILURE means that the direction computed by solving -  
J(u^n)^{-1} F(u^n) is NOT a descent direction
that is F(u^n - lambda * J(u^n)^{-1} F(u^n)) is not smaller than  
F(u^n) for 0 < lambda < 1.  Since you are solving the
linear system (in your last test) very accurately this almost always  
indicates the Jacobian is wrong. Please run with
-snes_type test and see what it says. Also recheck your Jacobian code.  
If this does not help then take a look at
src/snes/examples/tuturials/ex5.c and see how it can be run with  
fd_jacobian; you can try that with you code to
track down any errors in the Jacobian.

   Barry


On May 15, 2008, at 9:21 PM, Rafael Santos Coelho wrote:

> Hi people,
>
> thank you very much for the help. I couldn't fix the problem though...
>
> Matthew:
>
> 1) I guess so because for the vast majority of the tests carried  
> out, the method converges and you can actually observe norm(F(x))  
> decreasing with few Newton iterations. For example:
>
> $ mpirun -np 2 ./bratu_problem -N 16 -M 16 -P 16 - 
> ksp_converged_reason -snes_converged_reason -ksp_type lcd -pc_type  
> jacobi -ksp_monitor
>
>   0 KSP Residual norm 5.960419091967e-01
>   1 KSP Residual norm 1.235318806330e+00
> (...)
> Linear solve converged due to CONVERGED_RTOL iterations 56
>   0 KSP Residual norm 2.990541631546e-02
>   1 KSP Residual norm 1.332572441021e-02
> (...)
>  29 KSP Residual norm 3.225505605549e-07
>  30 KSP Residual norm 1.658059885118e-07
> Linear solve converged due to CONVERGED_RTOL iterations 30
>   0 KSP Residual norm 7.629434752036e-05
>   1 KSP Residual norm 1.413056976255e-05
> (...)
>  21 KSP Residual norm 1.183900079277e-09
>  22 KSP Residual norm 6.010910804534e-10
>
> Linear solve converged due to CONVERGED_RTOL iterations 22
> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
>
> 2) The governing PDE is  -Laplacian(u) + d * u_x -lambda * exp(u) =  
> 0 and u = 0 in all domain boundaries. u_x stands for the partial  
> derivative of u with respect to the x variable, where u = u(x, y,  
> z). For all the tests I made, d = 16 and lambda = 32. I tried  
> setting d = 0, but the error continued.
>
> Barry:
>
> Consider
>
> $ mpirun -np 8 ./bratu_problem -x 16 -y 16 -z 16 - 
> ksp_converged_reason -snes_converged_reason -ksp_type lcd -pc_type  
> jacobi -ksp_monitor
>
> Here's the output:
>
>   0 KSP Residual norm 5.960419091967e-01
>   1 KSP Residual norm 1.235318806330e+00
> (...)
>  55 KSP Residual norm 7.533575286046e-06
>  56 KSP Residual norm 4.924747432423e-06
> Linear solve converged due to CONVERGED_RTOL iterations 56
>   0 KSP Residual norm 5.899667305071e-01
>   1 KSP Residual norm 1.233037780509e+00
> (...)
>  56 KSP Residual norm 9.299650766487e-06
>  57 KSP Residual norm 5.541388445894e-06
> Linear solve converged due to CONVERGED_RTOL iterations 57
>   0 KSP Residual norm 5.898541843665e-01
>   1 KSP Residual norm 1.230515227262e+00
> (...)
>  57 KSP Residual norm 6.065473514455e-06
>  58 KSP Residual norm 3.255910272791e-06
>
> Linear solve converged due to CONVERGED_RTOL iterations 58
> Nonlinear solve did not converge due to DIVERGED_LS_FAILURE
>
> Now, if I use -ksp_rtol 1.e-10, same thing occurs, the only  
> difference is that the number of linear iterations per nonlinear  
> iteration gets bigger (as one might have expected).
>
> I'm using the classic 7-point stencil finite difference  
> approximation to discretize the PDE...
>


From knepley at gmail.com  Fri May 16 09:42:50 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 16 May 2008 09:42:50 -0500
Subject: Something weird with SNES convergence reason
In-Reply-To: <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com>
References: <3b6f83d40805142030j2c3d9f58i5dd85c41da52d0db@mail.gmail.com>
	 <5D95471E-C1C5-45C0-8657-79CE3DDA65D7@mcs.anl.gov>
	 <3b6f83d40805151921j7f488f18ge4866afd3df18fa0@mail.gmail.com>
Message-ID: <a9f269830805160742p2dc3d7b4u43b48a9b8f403d2b@mail.gmail.com>

On Thu, May 15, 2008 at 9:21 PM, Rafael Santos Coelho
<rafaelsantoscoelho at gmail.com> wrote:
> Hi people,
>
> thank you very much for the help. I couldn't fix the problem though...
>
> Matthew:
>
> 1) I guess so because for the vast majority of the tests carried out, the
> method converges and you can actually observe norm(F(x)) decreasing with few
> Newton iterations. For example:
>
> $ mpirun -np 2 ./bratu_problem -N 16 -M 16 -P 16 -ksp_converged_reason
> -snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor
>
>   0 KSP Residual norm 5.960419091967e-01
>   1 KSP Residual norm 1.235318806330e+00
> (...)
> Linear solve converged due to CONVERGED_RTOL iterations 56
>   0 KSP Residual norm 2.990541631546e-02
>   1 KSP Residual norm 1.332572441021e-02
> (...)
>  29 KSP Residual norm 3.225505605549e-07
>  30 KSP Residual norm 1.658059885118e-07
> Linear solve converged due to CONVERGED_RTOL iterations 30
>   0 KSP Residual norm 7.629434752036e-05
>   1 KSP Residual norm 1.413056976255e-05
> (...)
>  21 KSP Residual norm 1.183900079277e-09
>  22 KSP Residual norm 6.010910804534e-10
>
> Linear solve converged due to CONVERGED_RTOL iterations 22
> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
>
> 2) The governing PDE is  -Laplacian(u) + d * u_x -lambda * exp(u) = 0 and u
> = 0 in all domain boundaries. u_x stands for the partial derivative of u
> with respect to the x variable, where u = u(x, y, z). For all the tests I
> made, d = 16 and lambda = 32. I tried setting d = 0, but the error
> continued.

There is a real problem with d past the birfurcation point. Make d < 6
and run again.
Also, your code is wrong if d == 0 is a problem.

   Matt

> Barry:
>
> Consider
>
> $ mpirun -np 8 ./bratu_problem -x 16 -y 16 -z 16 -ksp_converged_reason
> -snes_converged_reason -ksp_type lcd -pc_type jacobi -ksp_monitor
>
> Here's the output:
>
>   0 KSP Residual norm 5.960419091967e-01
>   1 KSP Residual norm 1.235318806330e+00
> (...)
>  55 KSP Residual norm 7.533575286046e-06
>  56 KSP Residual norm 4.924747432423e-06
> Linear solve converged due to CONVERGED_RTOL iterations 56
>   0 KSP Residual norm 5.899667305071e-01
>   1 KSP Residual norm 1.233037780509e+00
> (...)
>  56 KSP Residual norm 9.299650766487e-06
>  57 KSP Residual norm 5.541388445894e-06
> Linear solve converged due to CONVERGED_RTOL iterations 57
>   0 KSP Residual norm 5.898541843665e-01
>   1 KSP Residual norm 1.230515227262e+00
> (...)
>  57 KSP Residual norm 6.065473514455e-06
>  58 KSP Residual norm 3.255910272791e-06
>
> Linear solve converged due to CONVERGED_RTOL iterations 58
> Nonlinear solve did not converge due to DIVERGED_LS_FAILURE
>
> Now, if I use -ksp_rtol 1.e-10, same thing occurs, the only difference is
> that the number of linear iterations per nonlinear iteration gets bigger (as
> one might have expected).
>
> I'm using the classic 7-point stencil finite difference approximation to
> discretize the PDE...
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From gdiso at ustc.edu  Fri May 16 10:47:12 2008
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 16 May 2008 23:47:12 +0800
Subject: mesh ordering and partition
Message-ID: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel>

Hi,
I am studying the parallel programming, using libmesh/petsc as an excellent 
example.

I have some questions about the partition and mesh ordering.


It seems libmesh does not reorder the mesh nodes. It only calls metis to 
partition the mesh, and

uses original node order to build the matrix. I wonder if a bad mesh 
ordering may cause low efficency

of ILU preconditioner. However, If I did RCM ordering to the mesh, the 
node's order may conflict with

contiguous index set int the subdomain partitioned by metis. How should I 
balance the ordering (to reduce filling)

and partition (to reduce communication)? any good ideas?


Regards,

Gong Ding


From knepley at gmail.com  Fri May 16 11:19:57 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 16 May 2008 11:19:57 -0500
Subject: mesh ordering and partition
In-Reply-To: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel>
References: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel>
Message-ID: <a9f269830805160919w98fa4c5mc150a0610ff8100e@mail.gmail.com>

On Fri, May 16, 2008 at 10:47 AM, Gong Ding <gdiso at ustc.edu> wrote:
> Hi,
> I am studying the parallel programming, using libmesh/petsc as an excellent
> example.
>
> I have some questions about the partition and mesh ordering.
>
>
>
> It seems libmesh does not reorder the mesh nodes. It only calls metis to
> partition the mesh, and
>
> uses original node order to build the matrix. I wonder if a bad mesh
> ordering may cause low efficency
>
> of ILU preconditioner. However, If I did RCM ordering to the mesh, the
> node's order may conflict with
>
> contiguous index set int the subdomain partitioned by metis. How should I
> balance the ordering (to reduce filling)
>
> and partition (to reduce communication)? any good ideas?

I think, if you are using the serial PETSc ILU, you should just use a
MatOrdering,
which can be done from the command line:

  -pc_type ilu -pc_factor_mat_ordering_type rcm

which I tested on KSP ex2.

  Matt

> Regards,
>
> Gong Ding
>
>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From stephane.aubert at fluorem.com  Fri May 16 11:22:18 2008
From: stephane.aubert at fluorem.com (Stephane Aubert)
Date: Fri, 16 May 2008 18:22:18 +0200
Subject: GMRES left-preconditioned with ILU1 versus ILU0
Message-ID: <482DB4BA.70406@fluorem.com>

Hi,
I tried to improve the convergence of a rather badly conditioned linear 
system by increasing the ILU(k) level from k=0 to k=1.
And, whereas I got a convergence for k=0, I ended up with an explosive 
divergence for k=1!!!
The PETCS version is 2.3.2-p8.
The matrix type is MPIBAIJ (block size=7,mesh points=23010,non-empty 
blocks=297454)
The common command line options are:

    * KSP="-ksp_type gmres -ksp_max_it 800 -ksp_gmres_restart 800
      -ksp_rtol 1.0e-12 -ksp_left_pc -ksp_gmres_modifiedgramschmidt
      -ksp_gmres_cgs_refinement_type REFINE_NEVER -ksp_singmonitor
      -ksp_compute_eigenvalues": I'm forcing 800 krylovs without restart
      to get the condition number of the pre-conditioned system (if I
      understand correctly the man page of -ksp_singmonitor)
    * PC="-pc_type asm -pc_asm_overlap 2": I'm planning to run with more
      than 1 partition, but for the time being, only one partition is used.
    * BLK_KSP="-sub_ksp_type preonly": Because of GMRES+ILU

For ILU(0), I'm using:

    * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 0
      -sub_pc_factor_fill 1.00 -sub_pc_factor_shift_nonzero
      -sub_pc_factor_mat_ordering_type rcm -sub_pc_factor_pivot_in_blocks"

and I got for convergence:
  0 KSP Residual norm 1.687258996558e+00 % max 1 min 1 max/min 1
  1 KSP Residual norm 1.687132576728e+00 % max 67.8829 min 67.8829 max/min 1
  2 KSP Residual norm 1.685760733293e+00 % max 3496.78 min 19.5582 
max/min 178.789
  3 KSP Residual norm 1.668552043995e+00 % max 3604.26 min 12.0073 
max/min 300.174
  4 KSP Residual norm 1.578511835381e+00 % max 3639.92 min 7.35118 
max/min 495.148
....
795 KSP Residual norm 1.465607932165e-09 % max 18209.9 min 0.00612973 
max/min 2.97075e+06
796 KSP Residual norm 1.390602265424e-09 % max 18227.3 min 0.00612913 
max/min 2.97388e+06
797 KSP Residual norm 1.320529491862e-09 % max 18231.9 min 0.0061286 
max/min 2.97489e+06
798 KSP Residual norm 1.253371917713e-09 % max 18234.8 min 0.00612856 
max/min 2.97538e+06
799 KSP Residual norm 1.188955299647e-09 % max 18278.5 min 0.00612594 
max/min 2.98378e+06
800 KSP Residual norm 1.118294486519e-09 % max 18278.5 min 0.00612475 
max/min 2.98437e+06

and the iterative solution compares very well with the one computed 
using complete LU factorization.

For ILU(1), I'm using:

    * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 1
      -sub_pc_factor_fill 3.81 -sub_pc_factor_shift_nonzero
      -sub_pc_factor_mat_ordering_type rcm
      -sub_pc_factor_pivot_in_blocks": RCM gives the smallest fill value.

and I got for "convergence":
  0 KSP Residual norm 7.095990612421e+126 % max 1 min 1 max/min 1
  1 KSP Residual norm 3.313547979190e+123 % max 1.68012e+135 min 
1.68012e+135 max/min 1
  2 KSP Residual norm 1.257750994639e+119 % max 6.34518e+135 min 
3.55953e+131 max/min 17825.9
  3 KSP Residual norm 5.233083258710e+118 % max 1.25538e+136 min 
1.42732e+127 max/min 8.79538e+08
  4 KSP Residual norm 1.938981257595e+118 % max 1.44472e+136 min 
4.82369e+125 max/min 2.99506e+10
  5 KSP Residual norm 3.371270926617e+116 % max 1.45841e+136 min 
1.79839e+125 max/min 8.1095e+10
  6 KSP Residual norm 2.179293254483e+115 % max 1.45842e+136 min 
5.16422e+122 max/min 2.82408e+13
  7 KSP Residual norm 2.120598006522e+115 % max 1.46024e+136 min 
3.67337e+122 max/min 3.97521e+13
  8 KSP Residual norm 1.486820601733e+115 % max 1.461e+136 min 
1.02249e+122 max/min 1.42886e+14
  9 KSP Residual norm 7.653834441859e+114 % max 1.46138e+136 min 
4.93314e+121 max/min 2.96237e+14
 10 KSP Residual norm 5.001204920218e+114 % max 1.47243e+136 min 
4.89675e+121 max/min 3.00694e+14

My guess is that the ILU(1) is singular (zero as diagonal elements?), 
but I thought that the options "-sub_pc_factor_shift_nonzero 
-sub_pc_factor_pivot_in_blocks" were taking care of that... I got lost 
in the source files trying to find out who at the end is computing and 
applying the ILU for MPIBAIJ format (replaced by SEQBAIJ with only one 
partition, I'm guessing).

The question is: What am I doing wrong? I never heard that ILU(1) was 
worst than ILU(0)!
Stef.

-- 
___________________________________________________________
Dr. Stephane AUBERT, CEO & CTO
FLUOREM s.a.s
Centre Scientifique Auguste MOIROUX
64 chemin des MOUILLES
F-69130 ECULLY, FRANCE
International:      fax: +33 4.78.33.99.39      tel: +33 4.78.33.99.35
France:             fax: 04.78.33.99.39         tel: 04.78.33.99.35
email: stephane.aubert at fluorem.com
web: www.fluorem.com


From knepley at gmail.com  Fri May 16 11:34:43 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 16 May 2008 11:34:43 -0500
Subject: GMRES left-preconditioned with ILU1 versus ILU0
In-Reply-To: <482DB4BA.70406@fluorem.com>
References: <482DB4BA.70406@fluorem.com>
Message-ID: <a9f269830805160934m1ac5ef21i5b4de570632e51d@mail.gmail.com>

On Fri, May 16, 2008 at 11:22 AM, Stephane Aubert
<stephane.aubert at fluorem.com> wrote:
> Hi,
> I tried to improve the convergence of a rather badly conditioned linear
> system by increasing the ILU(k) level from k=0 to k=1.
> And, whereas I got a convergence for k=0, I ended up with an explosive
> divergence for k=1!!!
> The PETCS version is 2.3.2-p8.
> The matrix type is MPIBAIJ (block size=7,mesh points=23010,non-empty
> blocks=297454)
> The common command line options are:
>
>   * KSP="-ksp_type gmres -ksp_max_it 800 -ksp_gmres_restart 800
>     -ksp_rtol 1.0e-12 -ksp_left_pc -ksp_gmres_modifiedgramschmidt
>     -ksp_gmres_cgs_refinement_type REFINE_NEVER -ksp_singmonitor
>     -ksp_compute_eigenvalues": I'm forcing 800 krylovs without restart
>     to get the condition number of the pre-conditioned system (if I
>     understand correctly the man page of -ksp_singmonitor)
>   * PC="-pc_type asm -pc_asm_overlap 2": I'm planning to run with more
>     than 1 partition, but for the time being, only one partition is used.
>   * BLK_KSP="-sub_ksp_type preonly": Because of GMRES+ILU
>
> For ILU(0), I'm using:
>
>   * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 0
>     -sub_pc_factor_fill 1.00 -sub_pc_factor_shift_nonzero
>     -sub_pc_factor_mat_ordering_type rcm -sub_pc_factor_pivot_in_blocks"
>
> and I got for convergence:
>  0 KSP Residual norm 1.687258996558e+00 % max 1 min 1 max/min 1
>  1 KSP Residual norm 1.687132576728e+00 % max 67.8829 min 67.8829 max/min 1
>  2 KSP Residual norm 1.685760733293e+00 % max 3496.78 min 19.5582 max/min
> 178.789
>  3 KSP Residual norm 1.668552043995e+00 % max 3604.26 min 12.0073 max/min
> 300.174
>  4 KSP Residual norm 1.578511835381e+00 % max 3639.92 min 7.35118 max/min
> 495.148
> ....
> 795 KSP Residual norm 1.465607932165e-09 % max 18209.9 min 0.00612973
> max/min 2.97075e+06
> 796 KSP Residual norm 1.390602265424e-09 % max 18227.3 min 0.00612913
> max/min 2.97388e+06
> 797 KSP Residual norm 1.320529491862e-09 % max 18231.9 min 0.0061286 max/min
> 2.97489e+06
> 798 KSP Residual norm 1.253371917713e-09 % max 18234.8 min 0.00612856
> max/min 2.97538e+06
> 799 KSP Residual norm 1.188955299647e-09 % max 18278.5 min 0.00612594
> max/min 2.98378e+06
> 800 KSP Residual norm 1.118294486519e-09 % max 18278.5 min 0.00612475
> max/min 2.98437e+06
>
> and the iterative solution compares very well with the one computed using
> complete LU factorization.
>
> For ILU(1), I'm using:
>
>   * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 1
>     -sub_pc_factor_fill 3.81 -sub_pc_factor_shift_nonzero
>     -sub_pc_factor_mat_ordering_type rcm
>     -sub_pc_factor_pivot_in_blocks": RCM gives the smallest fill value.
>
> and I got for "convergence":
>  0 KSP Residual norm 7.095990612421e+126 % max 1 min 1 max/min 1
>  1 KSP Residual norm 3.313547979190e+123 % max 1.68012e+135 min 1.68012e+135
> max/min 1
>  2 KSP Residual norm 1.257750994639e+119 % max 6.34518e+135 min 3.55953e+131
> max/min 17825.9
>  3 KSP Residual norm 5.233083258710e+118 % max 1.25538e+136 min 1.42732e+127
> max/min 8.79538e+08
>  4 KSP Residual norm 1.938981257595e+118 % max 1.44472e+136 min 4.82369e+125
> max/min 2.99506e+10
>  5 KSP Residual norm 3.371270926617e+116 % max 1.45841e+136 min 1.79839e+125
> max/min 8.1095e+10
>  6 KSP Residual norm 2.179293254483e+115 % max 1.45842e+136 min 5.16422e+122
> max/min 2.82408e+13
>  7 KSP Residual norm 2.120598006522e+115 % max 1.46024e+136 min 3.67337e+122
> max/min 3.97521e+13
>  8 KSP Residual norm 1.486820601733e+115 % max 1.461e+136 min 1.02249e+122
> max/min 1.42886e+14
>  9 KSP Residual norm 7.653834441859e+114 % max 1.46138e+136 min 4.93314e+121
> max/min 2.96237e+14
> 10 KSP Residual norm 5.001204920218e+114 % max 1.47243e+136 min 4.89675e+121
> max/min 3.00694e+14
>
> My guess is that the ILU(1) is singular (zero as diagonal elements?), but I
> thought that the options "-sub_pc_factor_shift_nonzero
> -sub_pc_factor_pivot_in_blocks" were taking care of that... I got lost in
> the source files trying to find out who at the end is computing and applying
> the ILU for MPIBAIJ format (replaced by SEQBAIJ with only one partition, I'm
> guessing).
>
> The question is: What am I doing wrong? I never heard that ILU(1) was worst
> than ILU(0)!

It is not uncommon. There are no theoretical guarantees for ILU(k), which is why
I dislike it so much. ILU(1) can indeed be worse than ILU(0), depending on the
type of matrix you have and the iterative solver used.

  Matt

> Stef.
>
> --
> ___________________________________________________________
> Dr. Stephane AUBERT, CEO & CTO
> FLUOREM s.a.s
> Centre Scientifique Auguste MOIROUX
> 64 chemin des MOUILLES
> F-69130 ECULLY, FRANCE
> International:      fax: +33 4.78.33.99.39      tel: +33 4.78.33.99.35
> France:             fax: 04.78.33.99.39         tel: 04.78.33.99.35
> email: stephane.aubert at fluorem.com
> web: www.fluorem.com
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From bsmith at mcs.anl.gov  Fri May 16 11:42:47 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 16 May 2008 11:42:47 -0500
Subject: GMRES left-preconditioned with ILU1 versus ILU0
In-Reply-To: <482DB4BA.70406@fluorem.com>
References: <482DB4BA.70406@fluorem.com>
Message-ID: <ED8A004B-22C5-47F8-868E-F19E6CA64B37@mcs.anl.gov>


    When monkeying with ILU it is always useful to run tests with - 
ksp_monitor_true_residual, because bad pivots in ILU
can produce wildly scaled preconditioners so the preconditioned  
residual can be at a different scale then the actual residual.

   The block versions of the factorizations don't support the  
pc_factor_shiftXXXX options. You could try to add it.

   That is one nasty matrix, I don't know if you'll ever get  
reasonable performance with ILU. You may want to stick to
direct solvers on each process (it may be slow and use lots of memory  
but at least it might work).

   Barry


On May 16, 2008, at 11:22 AM, Stephane Aubert wrote:

> Hi,
> I tried to improve the convergence of a rather badly conditioned  
> linear system by increasing the ILU(k) level from k=0 to k=1.
> And, whereas I got a convergence for k=0, I ended up with an  
> explosive divergence for k=1!!!
> The PETCS version is 2.3.2-p8.
> The matrix type is MPIBAIJ (block size=7,mesh points=23010,non-empty  
> blocks=297454)
> The common command line options are:
>
>   * KSP="-ksp_type gmres -ksp_max_it 800 -ksp_gmres_restart 800
>     -ksp_rtol 1.0e-12 -ksp_left_pc -ksp_gmres_modifiedgramschmidt
>     -ksp_gmres_cgs_refinement_type REFINE_NEVER -ksp_singmonitor
>     -ksp_compute_eigenvalues": I'm forcing 800 krylovs without restart
>     to get the condition number of the pre-conditioned system (if I
>     understand correctly the man page of -ksp_singmonitor)
>   * PC="-pc_type asm -pc_asm_overlap 2": I'm planning to run with more
>     than 1 partition, but for the time being, only one partition is  
> used.
>   * BLK_KSP="-sub_ksp_type preonly": Because of GMRES+ILU
>
> For ILU(0), I'm using:
>
>   * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 0
>     -sub_pc_factor_fill 1.00 -sub_pc_factor_shift_nonzero
>     -sub_pc_factor_mat_ordering_type rcm - 
> sub_pc_factor_pivot_in_blocks"
>
> and I got for convergence:
> 0 KSP Residual norm 1.687258996558e+00 % max 1 min 1 max/min 1
> 1 KSP Residual norm 1.687132576728e+00 % max 67.8829 min 67.8829 max/ 
> min 1
> 2 KSP Residual norm 1.685760733293e+00 % max 3496.78 min 19.5582 max/ 
> min 178.789
> 3 KSP Residual norm 1.668552043995e+00 % max 3604.26 min 12.0073 max/ 
> min 300.174
> 4 KSP Residual norm 1.578511835381e+00 % max 3639.92 min 7.35118 max/ 
> min 495.148
> ....
> 795 KSP Residual norm 1.465607932165e-09 % max 18209.9 min  
> 0.00612973 max/min 2.97075e+06
> 796 KSP Residual norm 1.390602265424e-09 % max 18227.3 min  
> 0.00612913 max/min 2.97388e+06
> 797 KSP Residual norm 1.320529491862e-09 % max 18231.9 min 0.0061286  
> max/min 2.97489e+06
> 798 KSP Residual norm 1.253371917713e-09 % max 18234.8 min  
> 0.00612856 max/min 2.97538e+06
> 799 KSP Residual norm 1.188955299647e-09 % max 18278.5 min  
> 0.00612594 max/min 2.98378e+06
> 800 KSP Residual norm 1.118294486519e-09 % max 18278.5 min  
> 0.00612475 max/min 2.98437e+06
>
> and the iterative solution compares very well with the one computed  
> using complete LU factorization.
>
> For ILU(1), I'm using:
>
>   * BLK_PC="-sub_pc_type ilu -sub_pc_factor_levels 1
>     -sub_pc_factor_fill 3.81 -sub_pc_factor_shift_nonzero
>     -sub_pc_factor_mat_ordering_type rcm
>     -sub_pc_factor_pivot_in_blocks": RCM gives the smallest fill  
> value.
>
> and I got for "convergence":
> 0 KSP Residual norm 7.095990612421e+126 % max 1 min 1 max/min 1
> 1 KSP Residual norm 3.313547979190e+123 % max 1.68012e+135 min  
> 1.68012e+135 max/min 1
> 2 KSP Residual norm 1.257750994639e+119 % max 6.34518e+135 min  
> 3.55953e+131 max/min 17825.9
> 3 KSP Residual norm 5.233083258710e+118 % max 1.25538e+136 min  
> 1.42732e+127 max/min 8.79538e+08
> 4 KSP Residual norm 1.938981257595e+118 % max 1.44472e+136 min  
> 4.82369e+125 max/min 2.99506e+10
> 5 KSP Residual norm 3.371270926617e+116 % max 1.45841e+136 min  
> 1.79839e+125 max/min 8.1095e+10
> 6 KSP Residual norm 2.179293254483e+115 % max 1.45842e+136 min  
> 5.16422e+122 max/min 2.82408e+13
> 7 KSP Residual norm 2.120598006522e+115 % max 1.46024e+136 min  
> 3.67337e+122 max/min 3.97521e+13
> 8 KSP Residual norm 1.486820601733e+115 % max 1.461e+136 min 1.02249e 
> +122 max/min 1.42886e+14
> 9 KSP Residual norm 7.653834441859e+114 % max 1.46138e+136 min  
> 4.93314e+121 max/min 2.96237e+14
> 10 KSP Residual norm 5.001204920218e+114 % max 1.47243e+136 min  
> 4.89675e+121 max/min 3.00694e+14
>
> My guess is that the ILU(1) is singular (zero as diagonal  
> elements?), but I thought that the options "- 
> sub_pc_factor_shift_nonzero -sub_pc_factor_pivot_in_blocks" were  
> taking care of that... I got lost in the source files trying to find  
> out who at the end is computing and applying the ILU for MPIBAIJ  
> format (replaced by SEQBAIJ with only one partition, I'm guessing).
>
> The question is: What am I doing wrong? I never heard that ILU(1)  
> was worst than ILU(0)!
> Stef.
>
> -- 
> ___________________________________________________________
> Dr. Stephane AUBERT, CEO & CTO
> FLUOREM s.a.s
> Centre Scientifique Auguste MOIROUX
> 64 chemin des MOUILLES
> F-69130 ECULLY, FRANCE
> International:      fax: +33 4.78.33.99.39      tel: +33 4.78.33.99.35
> France:             fax: 04.78.33.99.39         tel: 04.78.33.99.35
> email: stephane.aubert at fluorem.com
> web: www.fluorem.com
>
>
>


From gdiso at ustc.edu  Fri May 16 11:47:06 2008
From: gdiso at ustc.edu (Gong Ding)
Date: Sat, 17 May 2008 00:47:06 +0800
Subject: mesh ordering and partition
References: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel> <a9f269830805160919w98fa4c5mc150a0610ff8100e@mail.gmail.com>
Message-ID: <92116AFF53544591A677E9E5886DF53C@ustcatmel>


----- Original Message ----- 
From: "Matthew Knepley" <knepley at gmail.com>
To: <petsc-users at mcs.anl.gov>
Sent: Saturday, May 17, 2008 12:19 AM
Subject: Re: mesh ordering and partition


>
> I think, if you are using the serial PETSc ILU, you should just use a
> MatOrdering,
> which can be done from the command line:
>
>  -pc_type ilu -pc_factor_mat_ordering_type rcm
>
> which I tested on KSP ex2.
>
>  Matt


I am developing parallel code for 3D semiconductor device simulation.
>From the experience of 2D code, the GMRES solver with ILU works well (the 
matrix is asymmetric.)
As a result, I'd like to use GMRES+ILU again for 3D,  in parallel.
Does   -pc_type ilu -pc_factor_mat_ordering_type rcm still work?
Since the parallel martrix requires continuous index in subdomain, the 
matrix ordering seems troublesome.
maybe only a local ordering can be done... Am I right?

Gong Ding


From bsmith at mcs.anl.gov  Fri May 16 12:42:56 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 16 May 2008 12:42:56 -0500
Subject: mesh ordering and partition
In-Reply-To: <92116AFF53544591A677E9E5886DF53C@ustcatmel>
References: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel> <a9f269830805160919w98fa4c5mc150a0610ff8100e@mail.gmail.com> <92116AFF53544591A677E9E5886DF53C@ustcatmel>
Message-ID: <67F2BC12-21B2-4BEF-9B16-3053B6E2A6E9@mcs.anl.gov>


On May 16, 2008, at 11:47 AM, Gong Ding wrote:

>
> ----- Original Message ----- From: "Matthew Knepley" <knepley at gmail.com 
> >
> To: <petsc-users at mcs.anl.gov>
> Sent: Saturday, May 17, 2008 12:19 AM
> Subject: Re: mesh ordering and partition
>
>
>>
>> I think, if you are using the serial PETSc ILU, you should just use a
>> MatOrdering,
>> which can be done from the command line:
>>
>> -pc_type ilu -pc_factor_mat_ordering_type rcm
>>
>> which I tested on KSP ex2.
>>
>> Matt
>
>
> I am developing parallel code for 3D semiconductor device simulation.
> From the experience of 2D code, the GMRES solver with ILU works well  
> (the matrix is asymmetric.)
> As a result, I'd like to use GMRES+ILU again for 3D,  in parallel.
> Does   -pc_type ilu -pc_factor_mat_ordering_type rcm still work?
> Since the parallel martrix requires continuous index in subdomain,  
> the matrix ordering seems troublesome.
> maybe only a local ordering can be done... Am I right?
>

    PETSc does not have any parallel ILU, so when you run in parallel  
you must be either using
block Jacobi or the overlapping additive Schwarz method (block Jacobi  
with overlap between the blocks)
and ILU on the subdomains. In this case you must use the -sub prefix  
on ilu and ordering

>> -sub_pc_type ilu -sub_pc_factor_mat_ordering_type rcm
>

    The RCM ordering is done on the submatrix on each process, it is  
not parallel.

    It is important to note also that though rcm "may" improve the  
convergence rate of the ILU
slightly, using an ordering on the factorization does require some  
permutation of the vectors on
input and output to the MatSolve (which takes a little bit of time).  
You really need to run both and
see if one is faster than the other (use -log_summary as an option).

   Barry


> Gong Ding
>
>
>


From knepley at gmail.com  Fri May 16 12:37:52 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 16 May 2008 12:37:52 -0500
Subject: mesh ordering and partition
In-Reply-To: <92116AFF53544591A677E9E5886DF53C@ustcatmel>
References: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel>
	 <a9f269830805160919w98fa4c5mc150a0610ff8100e@mail.gmail.com>
	 <92116AFF53544591A677E9E5886DF53C@ustcatmel>
Message-ID: <a9f269830805161037g63695a90u38a880e162797230@mail.gmail.com>

On Fri, May 16, 2008 at 11:47 AM, Gong Ding <gdiso at ustc.edu> wrote:
>
> ----- Original Message ----- From: "Matthew Knepley" <knepley at gmail.com>
> To: <petsc-users at mcs.anl.gov>
> Sent: Saturday, May 17, 2008 12:19 AM
> Subject: Re: mesh ordering and partition
>
>
>>
>> I think, if you are using the serial PETSc ILU, you should just use a
>> MatOrdering,
>> which can be done from the command line:
>>
>>  -pc_type ilu -pc_factor_mat_ordering_type rcm
>>
>> which I tested on KSP ex2.
>>
>>  Matt
>
>
> I am developing parallel code for 3D semiconductor device simulation.
> From the experience of 2D code, the GMRES solver with ILU works well (the
> matrix is asymmetric.)
> As a result, I'd like to use GMRES+ILU again for 3D,  in parallel.
> Does   -pc_type ilu -pc_factor_mat_ordering_type rcm still work?
> Since the parallel martrix requires continuous index in subdomain, the
> matrix ordering seems troublesome.
> maybe only a local ordering can be done... Am I right?

Its a local ordering. Remember that Block-Jacobi ILU is a LOT worse than
serial ILU. I would not expect it to scale very well. You can try ASM to fix it
up, but there are no guarantees.

  Matt

> Gong Ding
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From jed at 59A2.org  Fri May 16 12:58:58 2008
From: jed at 59A2.org (Jed Brown)
Date: Fri, 16 May 2008 19:58:58 +0200
Subject: mesh ordering and partition
In-Reply-To: <92116AFF53544591A677E9E5886DF53C@ustcatmel>
References: <CD15A8A393854F1F88C7D3BDCD4E3D1A@ustcatmel> <a9f269830805160919w98fa4c5mc150a0610ff8100e@mail.gmail.com> <92116AFF53544591A677E9E5886DF53C@ustcatmel>
Message-ID: <20080516175858.GH21713@brakk.ethz.ch>

On Sat 2008-05-17 00:47, Gong Ding wrote:
> I am developing parallel code for 3D semiconductor device simulation.
> From the experience of 2D code, the GMRES solver with ILU works well (the 
> matrix is asymmetric.)
> As a result, I'd like to use GMRES+ILU again for 3D,  in parallel.
> Does   -pc_type ilu -pc_factor_mat_ordering_type rcm still work?
> Since the parallel martrix requires continuous index in subdomain, the 
> matrix ordering seems troublesome.
> maybe only a local ordering can be done... Am I right?

For parallel ILU, you could try -pc_type hypre -pc_hypre_type euclid.
Unfortunately, ILU requires a lot of communication so the parallel scaling tends
to be poor.

Jed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080516/b3b6f9c6/attachment.pgp>

From w_subber at yahoo.com  Tue May 20 14:12:20 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Tue, 20 May 2008 12:12:20 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
Message-ID: <601684.20346.qm@web38204.mail.mud.yahoo.com>

Hi,

I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up sparse sequential matrices (SeqAIJ) from each CPU. I am using 

MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse scall,Mat *mpimat) 

to do that. However, when I compile the code I get the following 

 undefined reference to `matmerge_seqstompi_'
collect2: ld returned 1 exit status
make: *** [all] Error 1

Am I using this function correctly ?

Thanks

Waad 

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080520/867fa5ad/attachment.htm>

From knepley at gmail.com  Tue May 20 14:55:47 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 May 2008 14:55:47 -0500
Subject: MatMerge_SeqsToMPI
In-Reply-To: <601684.20346.qm@web38204.mail.mud.yahoo.com>
References: <601684.20346.qm@web38204.mail.mud.yahoo.com>
Message-ID: <a9f269830805201255q706f61eaxbae45518a8e52588@mail.gmail.com>

On Tue, May 20, 2008 at 2:12 PM, Waad Subber <w_subber at yahoo.com> wrote:
> Hi,
>
> I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up
> sparse sequential matrices (SeqAIJ) from each CPU. I am using
>
> MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse
> scall,Mat *mpimat)
>
> to do that. However, when I compile the code I get the following
>
>  undefined reference to `matmerge_seqstompi_'
> collect2: ld returned 1 exit status
> make: *** [all] Error 1
>
> Am I using this function correctly ?

These have no Fortran bindings right now.

  Matt

> Thanks
>
> Waad
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From w_subber at yahoo.com  Tue May 20 15:16:44 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Tue, 20 May 2008 13:16:44 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <a9f269830805201255q706f61eaxbae45518a8e52588@mail.gmail.com>
Message-ID: <853663.98811.qm@web38203.mail.mud.yahoo.com>

 Thank you Matt,

Any suggestion to solve the problem I am trying to tackle. I want to solve a linear system: 

Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs. 

Where  A_i is a sparse sequential matrix and f_i is a sequential vector. Each CPU has one matrix and one vector of the same size. Now I want to sum up and solve the system in parallel.

Thanks again

Waad

Matthew Knepley <knepley at gmail.com> wrote: On Tue, May 20, 2008 at 2:12 PM, Waad Subber  wrote:
> Hi,
>
> I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up
> sparse sequential matrices (SeqAIJ) from each CPU. I am using
>
> MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse
> scall,Mat *mpimat)
>
> to do that. However, when I compile the code I get the following
>
>  undefined reference to `matmerge_seqstompi_'
> collect2: ld returned 1 exit status
> make: *** [all] Error 1
>
> Am I using this function correctly ?

These have no Fortran bindings right now.

  Matt

> Thanks
>
> Waad
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080520/5c1643a8/attachment.htm>

From bsmith at mcs.anl.gov  Tue May 20 15:49:59 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 20 May 2008 15:49:59 -0500
Subject: MatMerge_SeqsToMPI
In-Reply-To: <853663.98811.qm@web38203.mail.mud.yahoo.com>
References: <853663.98811.qm@web38203.mail.mud.yahoo.com>
Message-ID: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov>


On May 20, 2008, at 3:16 PM, Waad Subber wrote:

> Thank you Matt,
>
> Any suggestion to solve the problem I am trying to tackle. I want to  
> solve a linear system:
>
> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
>
> Where  A_i is a sparse sequential matrix and f_i is a sequential  
> vector. Each CPU has one matrix and one vector of the same size. Now  
> I want to sum up and solve the system in parallel.

    Does each A_i have nonzero entries (mostly) associated with one  
part of the matrix? Or does each process have values
scattered all around the matrix?

    In the former case you should simply create one parallel MPIAIJ  
matrix and call MatSetValues() to put the values
into it. We don't have any kind of support for the later case, perhaps  
if you describe how the matrix entries come about someone
would have suggestions on how to proceed.

   Barry

>
>
> Thanks again
>
> Waad
>
> Matthew Knepley <knepley at gmail.com> wrote: On Tue, May 20, 2008 at  
> 2:12 PM, Waad Subber wrote:
> > Hi,
> >
> > I am trying to construct a sparse parallel matrix (MPIAIJ) by  
> adding up
> > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> >
> > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt  
> n,MatReuse
> > scall,Mat *mpimat)
> >
> > to do that. However, when I compile the code I get the following
> >
> > undefined reference to `matmerge_seqstompi_'
> > collect2: ld returned 1 exit status
> > make: *** [all] Error 1
> >
> > Am I using this function correctly ?
>
> These have no Fortran bindings right now.
>
> Matt
>
> > Thanks
> >
> > Waad
> >
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>


From knepley at gmail.com  Tue May 20 15:49:15 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 May 2008 15:49:15 -0500
Subject: MatMerge_SeqsToMPI
In-Reply-To: <853663.98811.qm@web38203.mail.mud.yahoo.com>
References: <a9f269830805201255q706f61eaxbae45518a8e52588@mail.gmail.com>
	 <853663.98811.qm@web38203.mail.mud.yahoo.com>
Message-ID: <a9f269830805201349s2e206099r89293dcd39cd2bd4@mail.gmail.com>

Is there a reason not to assemble the whole system at once? This is usually
much easier. You can even use the same indices with MatSetValuesLocal().

  Matt

On Tue, May 20, 2008 at 3:16 PM, Waad Subber <w_subber at yahoo.com> wrote:
> Thank you Matt,
>
> Any suggestion to solve the problem I am trying to tackle. I want to solve a
> linear system:
>
> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
>
> Where  A_i is a sparse sequential matrix and f_i is a sequential vector.
> Each CPU has one matrix and one vector of the same size. Now I want to sum
> up and solve the system in parallel.
>
> Thanks again
>
> Waad
>
> Matthew Knepley <knepley at gmail.com> wrote:
>
> On Tue, May 20, 2008 at 2:12 PM, Waad Subber wrote:
>> Hi,
>>
>> I am trying to construct a sparse parallel matrix (MPIAIJ) by adding up
>> sparse sequential matrices (SeqAIJ) from each CPU. I am using
>>
>> MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt n,MatReuse
>> scall,Mat *mpimat)
>>
>> to do that. However, when I compile the code I get the following
>>
>> undefined reference to `matmerge_seqstompi_'
>> collect2: ld returned 1 exit status
>> make: *** [all] Error 1
>>
>> Am I using this function correctly ?
>
> These have no Fortran bindings right now.
>
> Matt
>
>> Thanks
>>
>> Waad
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From w_subber at yahoo.com  Tue May 20 19:15:25 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Tue, 20 May 2008 17:15:25 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov>
Message-ID: <75545.31527.qm@web38205.mail.mud.yahoo.com>

Thank you Matt and Barry,

The system I am trying to solve is the interface problem in iterative substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is [R_i^T*g_i].

Each process  constructs the local Schur complement matrix (S_i) , the restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential vector.

Now having the Schur complement matrix for each subdomain, I need to solve the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], .. i=1.. to No. of process (subdomains) in parallel. 

For the global vector I construct one MPI vector and use VecGetArray ()  for each of the sequential vector then use  VecSetValues () to add the values into the global MPI vector. That works fine.

However for the global schur complement matix I try the same idea by creating one parallel MPIAIJ matrix and using MatGetArray( ) and MatSetValues () in order to add the values to the global matrix. MatGetArray( ) gives me only the values without indices, so I don't know how to add these valuse to the global MPI matrix.

Thanks agin

Waad

Barry Smith <bsmith at mcs.anl.gov> wrote: 
On May 20, 2008, at 3:16 PM, Waad Subber wrote:

> Thank you Matt,
>
> Any suggestion to solve the problem I am trying to tackle. I want to  
> solve a linear system:
>
> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
>
> Where  A_i is a sparse sequential matrix and f_i is a sequential  
> vector. Each CPU has one matrix and one vector of the same size. Now  
> I want to sum up and solve the system in parallel.

    Does each A_i have nonzero entries (mostly) associated with one  
part of the matrix? Or does each process have values
scattered all around the matrix?

    In the former case you should simply create one parallel MPIAIJ  
matrix and call MatSetValues() to put the values
into it. We don't have any kind of support for the later case, perhaps  
if you describe how the matrix entries come about someone
would have suggestions on how to proceed.

   Barry

>
>
> Thanks again
>
> Waad
>
> Matthew Knepley  wrote: On Tue, May 20, 2008 at  
> 2:12 PM, Waad Subber wrote:
> > Hi,
> >
> > I am trying to construct a sparse parallel matrix (MPIAIJ) by  
> adding up
> > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> >
> > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt  
> n,MatReuse
> > scall,Mat *mpimat)
> >
> > to do that. However, when I compile the code I get the following
> >
> > undefined reference to `matmerge_seqstompi_'
> > collect2: ld returned 1 exit status
> > make: *** [all] Error 1
> >
> > Am I using this function correctly ?
>
> These have no Fortran bindings right now.
>
> Matt
>
> > Thanks
> >
> > Waad
> >
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080520/8c14454c/attachment.htm>

From knepley at gmail.com  Tue May 20 19:30:06 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 May 2008 19:30:06 -0500
Subject: MatMerge_SeqsToMPI
In-Reply-To: <75545.31527.qm@web38205.mail.mud.yahoo.com>
References: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov>
	 <75545.31527.qm@web38205.mail.mud.yahoo.com>
Message-ID: <a9f269830805201730wa2c812ei28a0dcb862735600@mail.gmail.com>

On Tue, May 20, 2008 at 7:15 PM, Waad Subber <w_subber at yahoo.com> wrote:
> Thank you Matt and Barry,
>
> The system I am trying to solve is the interface problem in iterative
> substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is
> [R_i^T*g_i].
>
> Each process  constructs the local Schur complement matrix (S_i) , the
> restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential
> vector.
>
> Now having the Schur complement matrix for each subdomain, I need to solve
> the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], .. i=1.. to No.
> of process (subdomains) in parallel.

Barry knows much more than me about substructuring, however:
You could form this matrix in parallel, but I thought that involved significant
communication. I would think that an unassembled form would be better.
To do this you would

  1) Construct the VecScatter from the set of local vectors to the global vector

  2) Create a MatShell and put the VecScatter in the user context

  3) For MatMult(), you would

    a) Scatter the input vector into the local copies

    b) Do each local MatMult()

    c) Scatter the result back to the output vector

Some work, but not that complicated.

   Matt

> For the global vector I construct one MPI vector and use VecGetArray ()  for
> each of the sequential vector then use  VecSetValues () to add the values
> into the global MPI vector. That works fine.
>
> However for the global schur complement matix I try the same idea by
> creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> MatSetValues () in order to add the values to the global matrix.
> MatGetArray( ) gives me only the values without indices, so I don't know how
> to add these valuse to the global MPI matrix.
>
> Thanks agin
>
> Waad
>
> Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On May 20, 2008, at 3:16 PM, Waad Subber wrote:
>
>> Thank you Matt,
>>
>> Any suggestion to solve the problem I am trying to tackle. I want to
>> solve a linear system:
>>
>> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
>>
>> Where A_i is a sparse sequential matrix and f_i is a sequential
>> vector. Each CPU has one matrix and one vector of the same size. Now
>> I want to sum up and solve the system in parallel.
>
> Does each A_i have nonzero entries (mostly) associated with one
> part of the matrix? Or does each process have values
> scattered all around the matrix?
>
> In the former case you should simply create one parallel MPIAIJ
> matrix and call MatSetValues() to put the values
> into it. We don't have any kind of support for the later case, perhaps
> if you describe how the matrix entries come about someone
> would have suggestions on how to proceed.
>
> Barry
>
>>
>>
>> Thanks again
>>
>> Waad
>>
>> Matthew Knepley wrote: On Tue, May 20, 2008 at
>> 2:12 PM, Waad Subber wrote:
>> > Hi,
>> >
>> > I am trying to construct a sparse parallel matrix (MPIAIJ) by
>> adding up
>> > sparse sequential matrices (SeqAIJ) from each CPU. I am using
>> >
>> > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
>> n,MatReuse
>> > scall,Mat *mpimat)
>> >
>> > to do that. However, when I compile the code I get the following
>> >
>> > undefined reference to `matmerge_seqstompi_'
>> > collect2: ld returned 1 exit status
>> > make: *** [all] Error 1
>> >
>> > Am I using this function correctly ?
>>
>> These have no Fortran bindings right now.
>>
>> Matt
>>
>> > Thanks
>> >
>> > Waad
>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>>
>>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From dalcinl at gmail.com  Tue May 20 19:44:32 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 20 May 2008 21:44:32 -0300
Subject: MatMerge_SeqsToMPI
In-Reply-To: <75545.31527.qm@web38205.mail.mud.yahoo.com>
References: <61B54D54-BB3D-4A7F-9AF9-5499559FA487@mcs.anl.gov>
	 <75545.31527.qm@web38205.mail.mud.yahoo.com>
Message-ID: <e7ba66e40805201744p54dad068iac1b50ba16d8f4b6@mail.gmail.com>

On 5/20/08, Waad Subber <w_subber at yahoo.com> wrote:
> The system I am trying to solve is the interface problem in iterative
> substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is
> [R_i^T*g_i].
>
> Each process  constructs the local Schur complement matrix (S_i) , the
> restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential
> vector.

Two questions:

1) How do you actually get the local Schur complements. You
explicitelly compute its entries, or do you compute it after computing
the inverse (or LU factors) of a 'local' matrix?

2) Your R_i matrix is actually a matrix? In that case, it is a trivial
restrinction operation with ones and zeros? Or R_i is actually a
VecScatter?


And finally: are you trying to apply a Krylov method over the global
Schur complement? In such a case, are you going to implement a
preconditioner for it?


> Now having the Schur complement matrix for each subdomain, I need to solve
> the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> .. i=1.. to No. of process (subdomains) in parallel.
>
> For the global vector I construct one MPI vector and use VecGetArray ()  for
> each of the sequential vector then use  VecSetValues () to add the values
> into the global MPI vector. That works fine.
>
> However for the global schur complement matix I try the same idea by
> creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> MatSetValues () in order to add the values to the global matrix.
> MatGetArray( ) gives me only the values without indices, so I don't know how
> to add these valuse to the global MPI matrix.
>
> Thanks agin
>
> Waad
>
> Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On May 20, 2008, at 3:16 PM, Waad Subber wrote:
>
> > Thank you Matt,
> >
> > Any suggestion to solve the problem I am trying to tackle. I want to
> > solve a linear system:
> >
> > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> >
> > Where A_i is a sparse sequential matrix and f_i is a sequential
> > vector. Each CPU has one matrix and one vector of the same size. Now
> > I want to sum up and solve the system in parallel.
>
>  Does each A_i have nonzero entries (mostly) associated with one
> part of the matrix? Or does each process have values
> scattered all around the matrix?
>
>  In the former case you should simply create one parallel MPIAIJ
> matrix and call MatSetValues() to put the values
> into it. We don't have any kind of support for the later case, perhaps
> if you describe how the matrix entries come about someone
> would have suggestions on how to proceed.
>
>  Barry
>
> >
> >
> > Thanks again
> >
> > Waad
> >
> > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > 2:12 PM, Waad Subber wrote:
> > > Hi,
> > >
> > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > adding up
> > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > >
> > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > n,MatReuse
> > > scall,Mat *mpimat)
> > >
> > > to do that. However, when I compile the code I get the following
> > >
> > > undefined reference to `matmerge_seqstompi_'
> > > collect2: ld returned 1 exit status
> > > make: *** [all] Error 1
> > >
> > > Am I using this function correctly ?
> >
> > These have no Fortran bindings right now.
> >
> > Matt
> >
> > > Thanks
> > >
> > > Waad
> > >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> >
> >
>
>
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From w_subber at yahoo.com  Tue May 20 19:56:02 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Tue, 20 May 2008 17:56:02 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <a9f269830805201730wa2c812ei28a0dcb862735600@mail.gmail.com>
Message-ID: <830531.15572.qm@web38206.mail.mud.yahoo.com>

Thank you Matt

 It seems a god idea I 'll try it. 

Waad

Matthew Knepley <knepley at gmail.com> wrote: On Tue, May 20, 2008 at 7:15 PM, Waad Subber  wrote:
> Thank you Matt and Barry,
>
> The system I am trying to solve is the interface problem in iterative
> substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is
> [R_i^T*g_i].
>
> Each process  constructs the local Schur complement matrix (S_i) , the
> restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential
> vector.
>
> Now having the Schur complement matrix for each subdomain, I need to solve
> the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i], .. i=1.. to No.
> of process (subdomains) in parallel.

Barry knows much more than me about substructuring, however:
You could form this matrix in parallel, but I thought that involved significant
communication. I would think that an unassembled form would be better.
To do this you would

  1) Construct the VecScatter from the set of local vectors to the global vector

  2) Create a MatShell and put the VecScatter in the user context

  3) For MatMult(), you would

    a) Scatter the input vector into the local copies

    b) Do each local MatMult()

    c) Scatter the result back to the output vector

Some work, but not that complicated.

   Matt

> For the global vector I construct one MPI vector and use VecGetArray ()  for
> each of the sequential vector then use  VecSetValues () to add the values
> into the global MPI vector. That works fine.
>
> However for the global schur complement matix I try the same idea by
> creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> MatSetValues () in order to add the values to the global matrix.
> MatGetArray( ) gives me only the values without indices, so I don't know how
> to add these valuse to the global MPI matrix.
>
> Thanks agin
>
> Waad
>
> Barry Smith  wrote:
>
> On May 20, 2008, at 3:16 PM, Waad Subber wrote:
>
>> Thank you Matt,
>>
>> Any suggestion to solve the problem I am trying to tackle. I want to
>> solve a linear system:
>>
>> Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
>>
>> Where A_i is a sparse sequential matrix and f_i is a sequential
>> vector. Each CPU has one matrix and one vector of the same size. Now
>> I want to sum up and solve the system in parallel.
>
> Does each A_i have nonzero entries (mostly) associated with one
> part of the matrix? Or does each process have values
> scattered all around the matrix?
>
> In the former case you should simply create one parallel MPIAIJ
> matrix and call MatSetValues() to put the values
> into it. We don't have any kind of support for the later case, perhaps
> if you describe how the matrix entries come about someone
> would have suggestions on how to proceed.
>
> Barry
>
>>
>>
>> Thanks again
>>
>> Waad
>>
>> Matthew Knepley wrote: On Tue, May 20, 2008 at
>> 2:12 PM, Waad Subber wrote:
>> > Hi,
>> >
>> > I am trying to construct a sparse parallel matrix (MPIAIJ) by
>> adding up
>> > sparse sequential matrices (SeqAIJ) from each CPU. I am using
>> >
>> > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
>> n,MatReuse
>> > scall,Mat *mpimat)
>> >
>> > to do that. However, when I compile the code I get the following
>> >
>> > undefined reference to `matmerge_seqstompi_'
>> > collect2: ld returned 1 exit status
>> > make: *** [all] Error 1
>> >
>> > Am I using this function correctly ?
>>
>> These have no Fortran bindings right now.
>>
>> Matt
>>
>> > Thanks
>> >
>> > Waad
>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>>
>>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080520/30e8c90f/attachment.htm>

From w_subber at yahoo.com  Tue May 20 20:06:27 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Tue, 20 May 2008 18:06:27 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <e7ba66e40805201744p54dad068iac1b50ba16d8f4b6@mail.gmail.com>
Message-ID: <855669.85397.qm@web38202.mail.mud.yahoo.com>


Lisandro Dalcin <dalcinl at gmail.com> wrote: On 5/20/08, Waad Subber  wrote:
> The system I am trying to solve is the interface problem in iterative
> substructuring DDM. Where A_i represents [R_i^T*S_i*R_i] and f_i is
> [R_i^T*g_i].
>
> Each process  constructs the local Schur complement matrix (S_i) , the
> restriction matrix(R_i) as SeqAIJ and the RHS vector (g_i) as a sequential
> vector.

Two questions:

1) How do you actually get the local Schur complements. You
explicitelly compute its entries, or do you compute it after computing
the inverse (or LU factors) of a 'local' matrix?

I construct the local Schur complement matrices after getting the inversion of A_II matrix for each subdomain.

2) Your R_i matrix is actually a matrix? In that case, it is a trivial
restrinction operation with ones and zeros? Or R_i is actually a
VecScatter?

R_i is the restriction matrix maps the global boundary nodes to the local boundary nodes and its entries is zero and one I store it as spare matrix, so only I need to store the nonzero entries which one entry per a row

And finally: are you trying to apply a Krylov method over the global
Schur complement? In such a case, are you going to implement a
preconditioner for it?

Yes, that what I am trying to do


> Now having the Schur complement matrix for each subdomain, I need to solve
> the interface problem (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> .. i=1.. to No. of process (subdomains) in parallel.
>
> For the global vector I construct one MPI vector and use VecGetArray ()  for
> each of the sequential vector then use  VecSetValues () to add the values
> into the global MPI vector. That works fine.
>
> However for the global schur complement matix I try the same idea by
> creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> MatSetValues () in order to add the values to the global matrix.
> MatGetArray( ) gives me only the values without indices, so I don't know how
> to add these valuse to the global MPI matrix.
>
> Thanks agin
>
> Waad
>
> Barry Smith  wrote:
>
> On May 20, 2008, at 3:16 PM, Waad Subber wrote:
>
> > Thank you Matt,
> >
> > Any suggestion to solve the problem I am trying to tackle. I want to
> > solve a linear system:
> >
> > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> >
> > Where A_i is a sparse sequential matrix and f_i is a sequential
> > vector. Each CPU has one matrix and one vector of the same size. Now
> > I want to sum up and solve the system in parallel.
>
>  Does each A_i have nonzero entries (mostly) associated with one
> part of the matrix? Or does each process have values
> scattered all around the matrix?
>
>  In the former case you should simply create one parallel MPIAIJ
> matrix and call MatSetValues() to put the values
> into it. We don't have any kind of support for the later case, perhaps
> if you describe how the matrix entries come about someone
> would have suggestions on how to proceed.
>
>  Barry
>
> >
> >
> > Thanks again
> >
> > Waad
> >
> > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > 2:12 PM, Waad Subber wrote:
> > > Hi,
> > >
> > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > adding up
> > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > >
> > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > n,MatReuse
> > > scall,Mat *mpimat)
> > >
> > > to do that. However, when I compile the code I get the following
> > >
> > > undefined reference to `matmerge_seqstompi_'
> > > collect2: ld returned 1 exit status
> > > make: *** [all] Error 1
> > >
> > > Am I using this function correctly ?
> >
> > These have no Fortran bindings right now.
> >
> > Matt
> >
> > > Thanks
> > >
> > > Waad
> > >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> >
> >
>
>
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080520/cbd7a758/attachment.htm>

From sdettrick at gmail.com  Wed May 21 11:14:41 2008
From: sdettrick at gmail.com (Sean Dettrick)
Date: Wed, 21 May 2008 09:14:41 -0700
Subject: mixed matrix type?
Message-ID: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com>

Hi,

I have a sparse N*N matrix generated from a DA and a 5 point stencil,  
with a total of approx 5*N non-zero entries.  Now I would like to  
extend this matrix by adding a smaller M*M dense matrix to the bottom  
right hand side, i.e. so that there is a dense square in the bottom  
right hand corner of the otherwise sparse matrix.  The total number of  
new non-zero entries, M*M, is comparable to the total number of old  
entries, 5*N.  On top of this, there would be a small number of non- 
zero entries in the new upper-right and lower-left rectangular  
portions of the matrix, due to coupling of the two systems.  The new  
total matrix size (including zeroes) would be (N+M)*(N+M).

Can anybody recommend a Mat type to store the new matrix?

One possibility I was thinking of was to establish the original sparse  
Mat with a DA in a sub-communicator (with half the CPUs), and get the  
ownership range with MatGetOwnershipRange.  Then in the  
petsc_comm_world communicator, the complete matrix could be  
constructed (by element-wise copying the old one I suppose), and the  
ownership range could be maintained manually.

Does this sound like a reasonable strategy?

I would very much appreciate any suggestions or advice.

Thanks,

Sean Dettrick


From sdettrick at gmail.com  Wed May 21 11:21:43 2008
From: sdettrick at gmail.com (Sean Dettrick)
Date: Wed, 21 May 2008 09:21:43 -0700
Subject: Fwd: mixed matrix type?
References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com>
Message-ID: <7CB7A206-8DD3-49FB-9965-DD0F6E939058@gmail.com>


I should have also mentioned that the matrix is symmetric.

Thanks,
Sean


Begin forwarded message:

> From: Sean Dettrick <sdettrick at gmail.com>
> Date: May 21, 2008 9:14:41 AM PDT
> To: petsc-users at mcs.anl.gov
> Subject: mixed matrix type?
>
> Hi,
>
> I have a sparse N*N matrix generated from a DA and a 5 point  
> stencil, with a total of approx 5*N non-zero entries.  Now I would  
> like to extend this matrix by adding a smaller M*M dense matrix to  
> the bottom right hand side, i.e. so that there is a dense square in  
> the bottom right hand corner of the otherwise sparse matrix.  The  
> total number of new non-zero entries, M*M, is comparable to the  
> total number of old entries, 5*N.  On top of this, there would be a  
> small number of non-zero entries in the new upper-right and lower- 
> left rectangular portions of the matrix, due to coupling of the two  
> systems.  The new total matrix size (including zeroes) would be (N 
> +M)*(N+M).
>
> Can anybody recommend a Mat type to store the new matrix?
>
> One possibility I was thinking of was to establish the original  
> sparse Mat with a DA in a sub-communicator (with half the CPUs), and  
> get the ownership range with MatGetOwnershipRange.  Then in the  
> petsc_comm_world communicator, the complete matrix could be  
> constructed (by element-wise copying the old one I suppose), and the  
> ownership range could be maintained manually.
>
> Does this sound like a reasonable strategy?
>
> I would very much appreciate any suggestions or advice.
>
> Thanks,
>
> Sean Dettrick

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080521/d519d3c7/attachment.htm>

From bsmith at mcs.anl.gov  Wed May 21 11:26:21 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 21 May 2008 11:26:21 -0500
Subject: mixed matrix type?
In-Reply-To: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com>
References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com>
Message-ID: <D7AACE04-2EEC-404C-A061-86980FEDD595@mcs.anl.gov>


    We would like very much to have flexible code that did all this  
stuff for you, sadly we are far far away from this.

    I would suggest looking at DAGetMatrix() and pick the one that  
matches your DA (2d or 3d) etc. Then you
could modify the code that preallocates the matrix with the additional  
preallocation information and then modify the part that
puts in the locations of nonzeros to match your structure. This way  
you will get perfect preallocation (which will be
important for you to get good speed). I would just use the MPIAIJ  
matrix format for now (and maybe forever is ok)
because the inode code will make that smaller dense part pretty fast  
anyways without requiring massively complicated
matrix data structures that store parts dense and parts sparse.

    Barry

On May 21, 2008, at 11:14 AM, Sean Dettrick wrote:

> Hi,
>
> I have a sparse N*N matrix generated from a DA and a 5 point  
> stencil, with a total of approx 5*N non-zero entries.  Now I would  
> like to extend this matrix by adding a smaller M*M dense matrix to  
> the bottom right hand side, i.e. so that there is a dense square in  
> the bottom right hand corner of the otherwise sparse matrix.  The  
> total number of new non-zero entries, M*M, is comparable to the  
> total number of old entries, 5*N.  On top of this, there would be a  
> small number of non-zero entries in the new upper-right and lower- 
> left rectangular portions of the matrix, due to coupling of the two  
> systems.  The new total matrix size (including zeroes) would be (N 
> +M)*(N+M).
>
> Can anybody recommend a Mat type to store the new matrix?
>
> One possibility I was thinking of was to establish the original  
> sparse Mat with a DA in a sub-communicator (with half the CPUs), and  
> get the ownership range with MatGetOwnershipRange.  Then in the  
> petsc_comm_world communicator, the complete matrix could be  
> constructed (by element-wise copying the old one I suppose), and the  
> ownership range could be maintained manually.
>
> Does this sound like a reasonable strategy?
>
> I would very much appreciate any suggestions or advice.
>
> Thanks,
>
> Sean Dettrick
>
>


From knepley at gmail.com  Wed May 21 11:29:17 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 21 May 2008 11:29:17 -0500
Subject: mixed matrix type?
In-Reply-To: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com>
References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com>
Message-ID: <a9f269830805210929q1c34f0f9hd4b6e22e98646550@mail.gmail.com>

On Wed, May 21, 2008 at 11:14 AM, Sean Dettrick <sdettrick at gmail.com> wrote:
> Hi,
>
> I have a sparse N*N matrix generated from a DA and a 5 point stencil, with a
> total of approx 5*N non-zero entries.  Now I would like to extend this
> matrix by adding a smaller M*M dense matrix to the bottom right hand side,
> i.e. so that there is a dense square in the bottom right hand corner of the
> otherwise sparse matrix.  The total number of new non-zero entries, M*M, is
> comparable to the total number of old entries, 5*N.  On top of this, there
> would be a small number of non-zero entries in the new upper-right and
> lower-left rectangular portions of the matrix, due to coupling of the two
> systems.  The new total matrix size (including zeroes) would be (N+M)*(N+M).
>
> Can anybody recommend a Mat type to store the new matrix?
>
> One possibility I was thinking of was to establish the original sparse Mat
> with a DA in a sub-communicator (with half the CPUs), and get the ownership
> range with MatGetOwnershipRange.  Then in the petsc_comm_world communicator,
> the complete matrix could be constructed (by element-wise copying the old
> one I suppose), and the ownership range could be maintained manually.
>
> Does this sound like a reasonable strategy?

Not really. If you only want parallelism suitable for the M matrix, I
would use a Schur
complement strategy:

  1) Make the DA matrix

  2) Make M, and the coupling matrices B, B^T

  3) Make a MatShell that has a MatMult() that applies

       B^T A^{-1} B + M

   with A^{-1} done using a KSPSolve().

This seems easier to program and solve to me than the fully assembled one. We
hope later to have something built to do the full assembly.

  Matt

> I would very much appreciate any suggestions or advice.
>
> Thanks,
>
> Sean Dettrick
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From sdettrick at gmail.com  Wed May 21 11:45:23 2008
From: sdettrick at gmail.com (Sean Dettrick)
Date: Wed, 21 May 2008 09:45:23 -0700
Subject: mixed matrix type?
In-Reply-To: <a9f269830805210929q1c34f0f9hd4b6e22e98646550@mail.gmail.com>
References: <3CAE188A-9C1A-4C76-B82E-02CB868F7950@gmail.com> <a9f269830805210929q1c34f0f9hd4b6e22e98646550@mail.gmail.com>
Message-ID: <9E9D647B-94FC-4F33-BFF0-8D150CA5509C@gmail.com>


Thanks Matt, thanks Barry, much appreciated.

Best,
Sean


From dalcinl at gmail.com  Thu May 22 15:15:27 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Thu, 22 May 2008 17:15:27 -0300
Subject: MatMerge_SeqsToMPI
In-Reply-To: <855669.85397.qm@web38202.mail.mud.yahoo.com>
References: <e7ba66e40805201744p54dad068iac1b50ba16d8f4b6@mail.gmail.com>
	 <855669.85397.qm@web38202.mail.mud.yahoo.com>
Message-ID: <e7ba66e40805221315k688d1a26se141d56414ee97e4@mail.gmail.com>

On 5/20/08, Waad Subber <w_subber at yahoo.com> wrote:
> 1) How do you actually get the local Schur complements. You
> explicitelly compute its entries, or do you compute it after computing
> the inverse (or LU factors) of a 'local' matrix?
>
> I construct the local Schur complement matrices after getting the inversion
> of A_II matrix for each subdomain.

Fine,

> 2) Your R_i matrix is actually a matrix? In that case, it is a trivial
> restrinction operation with ones and zeros? Or R_i is actually a
> VecScatter?
>
> R_i is the restriction matrix maps the global boundary nodes to the local
> boundary nodes and its entries is zero and one I store it as spare matrix,
> so only I need to store the nonzero entries which one entry per a row

I believe a VecScatter will perform much better for this task.


> And finally: are you trying to apply a Krylov method over the global
> Schur complement? In such a case, are you going to implement a
> preconditioner for it?
>
> Yes, that what I am trying to do

Well, please let me make some comments. I've spent many days and month
optimizing Schur complement iterations, and I ended giving up. I was
never able to get it perform better than ASM preconditioner (iff
appropriatelly used, ie. solving local problems with LU, and
implementing subdomain subpartitioning the smart way, not the way
currently implemented in PETSc, were subpartitioning is done by chunks
of continuous rows).

If you are doing research on this, I would love to know your
conclusion when you get your work done. If you are doing all this just
with the hope of getting better running times, well, remember my above
comments but also remember that I do not consider myself a smart guy
;-)

As I said before, I worked hard for implementing general Schur
complement iteration. All this code is avalable in the SVN repository
of petsc4py (PETSc for Python), but it could be easily stripped out
for use in any PETSc-based code in C/C++. This implementation requires
the use of a MATIS matrix type (there is also a separate
implementation for MATMPIAIJ maatrices), I've implemented subdomain
subpartitioning (using a simple recursive graph splitting procedure
reusing matrix reordering routines built-in in PETSc, could be done
better with METIS); when the A_ii problems are large, their LU
factorization can be a real bootleneck.  I've even implemented a
global preconditioner operation for the interface problem, based on
iterating over a 'strip' of nodes around the interface; it improves
convergence and is usefull for ill-conditioned systems, but the costs
are increased.

If you ever want to take a look at my implemention for try to use it,
or perhaps take ideas for your own implementation, let me know.


> > Now having the Schur complement matrix for each subdomain, I need to solve
> > the interface problem
> (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > .. i=1.. to No. of process (subdomains) in parallel.
> >
> > For the global vector I construct one MPI vector and use VecGetArray ()
> for
> > each of the sequential vector then use VecSetValues () to add the values
> > into the global MPI vector. That works fine.
> >
> > However for the global schur complement matix I try the same idea by
> > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > MatSetValues () in order to add the values to the global matrix.
> > MatGetArray( ) gives me only the values without indices, so I don't know
> how
> > to add these valuse to the global MPI matrix.
> >
> > Thanks agin
> >
> > Waad
> >
> > Barry Smith wrote:
> >
> > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> >
> > > Thank you Matt,
> > >
> > > Any suggestion to solve the problem I am trying to tackle. I want to
> > > solve a linear system:
> > >
> > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > >
> > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > vector. Each CPU has one matrix and one vector of the same size. Now
> > > I want to sum up and solve the system in parallel.
> >
> > Does each A_i have nonzero entries (mostly) associated with one
> > part of the matrix? Or does each process have values
> > scattered all around the matrix?
> >
> > In the former case you should simply create one parallel MPIAIJ
> > matrix and call MatSetValues() to put the values
> > into it. We don't have any kind of support for the later case, perhaps
> > if you describe how the matrix entries come about someone
> > would have suggestions on how to proceed.
> >
> > Barry
> >
> > >
> > >
> > > Thanks again
> > >
> > > Waad
> > >
> > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > 2:12 PM, Waad Subber wrote:
> > > > Hi,
> > > >
> > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > adding up
> > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > >
> > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > > n,MatReuse
> > > > scall,Mat *mpimat)
> > > >
> > > > to do that. However, when I compile the code I get the following
> > > >
> > > > undefined reference to `matmerge_seqstompi_'
> > > > collect2: ld returned 1 exit status
> > > > make: *** [all] Error 1
> > > >
> > > > Am I using this function correctly ?
> > >
> > > These have no Fortran bindings right now.
> > >
> > > Matt
> > >
> > > > Thanks
> > > >
> > > > Waad
> > > >
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> > > their experiments lead.
> > > -- Norbert Wiener
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From amjad11 at gmail.com  Fri May 23 00:30:42 2008
From: amjad11 at gmail.com (amjad ali)
Date: Fri, 23 May 2008 10:30:42 +0500
Subject: general question on speed using quad core Xeons
In-Reply-To: <a9f269830804151734w6303dd8je27b595c46730950@mail.gmail.com>
References: <48054602.9040200@gmail.com>
	 <a9f269830804151734w6303dd8je27b595c46730950@mail.gmail.com>
Message-ID: <428810f20805222230p6100e1b8wb47e846bd52ff3fd@mail.gmail.com>

Hello all, specially Dr. Matt,

On 4/16/08, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie <rlmackie862 at gmail.com>
> wrote:
> > I'm running my PETSc code on a cluster of quad core Xeon's connected
> >  by Infiniband. I hadn't much worried about the performance, because
> >  everything seemed to be working quite well, but today I was actually
> >  comparing performance (wall clock time) for the same problem, but on
> >  different combinations of CPUS.
> >
> >  I find that my PETSc code is quite scalable until I start to use
> >  multiple cores/cpu.
> >
> >  For example, the run time doesn't improve by going from 1 core/cpu
> >  to 4 cores/cpu, and I find this to be very strange, especially since
> >  looking at top or Ganglia, all 4 cpus on each node are running at 100%
> > almost
> >  all of the time. I would have thought if the cpus were going all out,
> >  that I would still be getting much more scalable results.
>
> Those a really coarse measures. There is absolutely no way that all cores
> are going 100%. Its easy to show by hand. Take the peak flop rate and
> this gives you the bandwidth needed to sustain that computation (if
> everything is perfect, like axpy). You will find that the chip bandwidth
> is far below this. A nice analysis is in
>
> http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf
>
> >  We are using mvapich-0.9.9 with infiniband. So, I don't know if
> >  this is a cluster/Xeon issue, or something else.
>
> This is actually mathematics! How satisfying. The only way to improve
> this is to change the data structure (e.g. use blocks) or change the
> algorithm (e.g. use spectral elements and unassembled structures)


Would you please explain a bit about "unassembled structures"?
Does Discontinuous Galerkin Method falls into this category?

Thanks and Regrads,
Amjad Ali.

Matt
>
> >  Anybody with experience on this?
> >
> >  Thanks, Randy M.
> >
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/eb482d9c/attachment.htm>

From jed at 59A2.org  Fri May 23 02:52:13 2008
From: jed at 59A2.org (Jed Brown)
Date: Fri, 23 May 2008 09:52:13 +0200
Subject: general question on speed using quad core Xeons
In-Reply-To: <428810f20805222230p6100e1b8wb47e846bd52ff3fd@mail.gmail.com>
References: <48054602.9040200@gmail.com> <a9f269830804151734w6303dd8je27b595c46730950@mail.gmail.com> <428810f20805222230p6100e1b8wb47e846bd52ff3fd@mail.gmail.com>
Message-ID: <20080523075213.GE21713@brakk.ethz.ch>

On Fri 2008-05-23 10:30, amjad ali wrote:
> Would you please explain a bit about "unassembled structures"?
> Does Discontinuous Galerkin Method falls into this category?

I'm doing some work on this so I'll try to answer.

There are two components which can be ``unassembled'' namely the matrix
application and the preconditioner.  In general, unassembled makes the most
sense for semi-structured approximations where there is natural data granularity
similar to L1 cache size.  A standard example is the spectral or p-version
finite element method.  In these methods, the element stiffness matrix is dense
and can be very large, but it is possible to apply it without storing the
entries.  For instance, suppose we have polynomial order p-1 on a hexahedral
element.  Then there are p^3 element degrees of freedom and the global matrix
will have p^6 nonzeros contributed by this element (if we assemble it).  For
p=10, this is already big and will make preconditioning very expensive.  On the
other hand, we can apply the element Jacobian in O(p^3) space and O(p^4) time if
we exploit a tensor product basis.  Even if we don't use tensor products, the
space requirement can stay the same.  For high order p, this will be a lot less
operations, but the important point with regard to FLOP/s is that there is now
more work per amount of memory touched.  If there are N elements, the total
number of degrees of freedom is O(N p^3) and the matrix has O(N p^6) entries so
multiplication is O(N p^6) time and space.  If applied locally (i.e. scatter to
local basis, apply there, scatter back) the space requirement is O(N p^3) and
the time is O(N p^4) or O(N p^6) depending on the basis.  In addition, the
element operations can often be done entirely in L1 cache.  Clearly this puts a
lot less stress on the memory bus.

Of course, just applying the Jacobian fast won't cut it, we also need a
preconditioner.  One way is to assemble a sparser approximation to the global
Jacobian and apply standard preconditioners.  Another is to apply a domain
decomposition preconditioner which exploits the data granularity.  For instance,
Schur complement preconditioners can be formed based on solves on a single
element.  These are most attractive when there is a `fast' way to solve the
local problem on a single element, but they can be expected to increase the
FLOP/s rate either way because the memory access pattern requires more work for
a given amount of memory.  (I don't know a lot about DD preconditioners.  I'm
using the former approach, assembling a sparser approximation of the Jacobian.)

Discontinuous Galerkin happens to be easy to implement for high order elements
and the scatter from global to local basis is trivial.

Jed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/d1547be8/attachment.pgp>

From w_subber at yahoo.com  Fri May 23 11:20:14 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Fri, 23 May 2008 09:20:14 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <e7ba66e40805221315k688d1a26se141d56414ee97e4@mail.gmail.com>
Message-ID: <212555.70047.qm@web38205.mail.mud.yahoo.com>

Thank you for the useful comments. For sure I will consider them. I've just starting my research by writing a DDM substructuring code which scales for now up-to 60 CPUs using petsc KSP solver for the interface problem and Lapack direct factorization for the interior problem. I split the domain using METIS library and assign each subdomain to one process then solve the global Schur complement using parallel preconditioned iterative solver. As an initial attempt, I solved a 2D elasticity problem (about 100,000 DOFs) within seconds using this algorithm. I notice Lapack solver for the interior problem takes a lot of time compare to the iterative solver for the interface, so now I am replacing the direct factorization with petsc KSP solver.

I would like very much to have a look at your implementation, and I think that will be very useful to me.

Thanks 

Waad

Lisandro Dalcin <dalcinl at gmail.com> wrote: On 5/20/08, Waad Subber  wrote:
> 1) How do you actually get the local Schur complements. You
> explicitelly compute its entries, or do you compute it after computing
> the inverse (or LU factors) of a 'local' matrix?
>
> I construct the local Schur complement matrices after getting the inversion
> of A_II matrix for each subdomain.

Fine,

> 2) Your R_i matrix is actually a matrix? In that case, it is a trivial
> restrinction operation with ones and zeros? Or R_i is actually a
> VecScatter?
>
> R_i is the restriction matrix maps the global boundary nodes to the local
> boundary nodes and its entries is zero and one I store it as spare matrix,
> so only I need to store the nonzero entries which one entry per a row

I believe a VecScatter will perform much better for this task.


> And finally: are you trying to apply a Krylov method over the global
> Schur complement? In such a case, are you going to implement a
> preconditioner for it?
>
> Yes, that what I am trying to do

Well, please let me make some comments. I've spent many days and month
optimizing Schur complement iterations, and I ended giving up. I was
never able to get it perform better than ASM preconditioner (iff
appropriatelly used, ie. solving local problems with LU, and
implementing subdomain subpartitioning the smart way, not the way
currently implemented in PETSc, were subpartitioning is done by chunks
of continuous rows).

If you are doing research on this, I would love to know your
conclusion when you get your work done. If you are doing all this just
with the hope of getting better running times, well, remember my above
comments but also remember that I do not consider myself a smart guy
;-)

As I said before, I worked hard for implementing general Schur
complement iteration. All this code is avalable in the SVN repository
of petsc4py (PETSc for Python), but it could be easily stripped out
for use in any PETSc-based code in C/C++. This implementation requires
the use of a MATIS matrix type (there is also a separate
implementation for MATMPIAIJ maatrices), I've implemented subdomain
subpartitioning (using a simple recursive graph splitting procedure
reusing matrix reordering routines built-in in PETSc, could be done
better with METIS); when the A_ii problems are large, their LU
factorization can be a real bootleneck.  I've even implemented a
global preconditioner operation for the interface problem, based on
iterating over a 'strip' of nodes around the interface; it improves
convergence and is usefull for ill-conditioned systems, but the costs
are increased.

If you ever want to take a look at my implemention for try to use it,
or perhaps take ideas for your own implementation, let me know.


> > Now having the Schur complement matrix for each subdomain, I need to solve
> > the interface problem
> (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > .. i=1.. to No. of process (subdomains) in parallel.
> >
> > For the global vector I construct one MPI vector and use VecGetArray ()
> for
> > each of the sequential vector then use VecSetValues () to add the values
> > into the global MPI vector. That works fine.
> >
> > However for the global schur complement matix I try the same idea by
> > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > MatSetValues () in order to add the values to the global matrix.
> > MatGetArray( ) gives me only the values without indices, so I don't know
> how
> > to add these valuse to the global MPI matrix.
> >
> > Thanks agin
> >
> > Waad
> >
> > Barry Smith wrote:
> >
> > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> >
> > > Thank you Matt,
> > >
> > > Any suggestion to solve the problem I am trying to tackle. I want to
> > > solve a linear system:
> > >
> > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > >
> > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > vector. Each CPU has one matrix and one vector of the same size. Now
> > > I want to sum up and solve the system in parallel.
> >
> > Does each A_i have nonzero entries (mostly) associated with one
> > part of the matrix? Or does each process have values
> > scattered all around the matrix?
> >
> > In the former case you should simply create one parallel MPIAIJ
> > matrix and call MatSetValues() to put the values
> > into it. We don't have any kind of support for the later case, perhaps
> > if you describe how the matrix entries come about someone
> > would have suggestions on how to proceed.
> >
> > Barry
> >
> > >
> > >
> > > Thanks again
> > >
> > > Waad
> > >
> > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > 2:12 PM, Waad Subber wrote:
> > > > Hi,
> > > >
> > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > adding up
> > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > >
> > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > > n,MatReuse
> > > > scall,Mat *mpimat)
> > > >
> > > > to do that. However, when I compile the code I get the following
> > > >
> > > > undefined reference to `matmerge_seqstompi_'
> > > > collect2: ld returned 1 exit status
> > > > make: *** [all] Error 1
> > > >
> > > > Am I using this function correctly ?
> > >
> > > These have no Fortran bindings right now.
> > >
> > > Matt
> > >
> > > > Thanks
> > > >
> > > > Waad
> > > >
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> > > their experiments lead.
> > > -- Norbert Wiener
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/46b8cef3/attachment.htm>

From bsmith at mcs.anl.gov  Fri May 23 13:10:48 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 23 May 2008 13:10:48 -0500
Subject: MatMerge_SeqsToMPI
In-Reply-To: <212555.70047.qm@web38205.mail.mud.yahoo.com>
References: <212555.70047.qm@web38205.mail.mud.yahoo.com>
Message-ID: <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov>


On May 23, 2008, at 11:20 AM, Waad Subber wrote:

> Thank you for the useful comments. For sure I will consider them.  
> I've just starting my research by writing a DDM substructuring code  
> which scales for now up-to 60 CPUs using petsc KSP solver for the  
> interface problem and Lapack direct factorization for the interior  
> problem. I split the domain using METIS library and assign each  
> subdomain to one process then solve the global Schur complement  
> using parallel preconditioned iterative solver. As an initial  
> attempt, I solved a 2D elasticity problem (about 100,000 DOFs)  
> within seconds using this algorithm. I notice Lapack solver for the  
> interior problem takes a lot of time compare to the iterative solver  
> for the interface, so now I am replacing the direct factorization  
> with petsc KSP solver.

    In my experience you will want to use a KSP direct solve, for  
example just -ksp_type preonly -pc_type lu; using an iterative solver  
in such Schur complement is
always way to slow.

   Barry

>
>
> I would like very much to have a look at your implementation, and I  
> think that will be very useful to me.
>
> Thanks
>
> Waad
>
> Lisandro Dalcin <dalcinl at gmail.com> wrote: On 5/20/08, Waad Subber  
> wrote:
> > 1) How do you actually get the local Schur complements. You
> > explicitelly compute its entries, or do you compute it after  
> computing
> > the inverse (or LU factors) of a 'local' matrix?
> >
> > I construct the local Schur complement matrices after getting the  
> inversion
> > of A_II matrix for each subdomain.
>
> Fine,
>
> > 2) Your R_i matrix is actually a matrix? In that case, it is a  
> trivial
> > restrinction operation with ones and zeros? Or R_i is actually a
> > VecScatter?
> >
> > R_i is the restriction matrix maps the global boundary nodes to  
> the local
> > boundary nodes and its entries is zero and one I store it as spare  
> matrix,
> > so only I need to store the nonzero entries which one entry per a  
> row
>
> I believe a VecScatter will perform much better for this task.
>
>
> > And finally: are you trying to apply a Krylov method over the global
> > Schur complement? In such a case, are you going to implement a
> > preconditioner for it?
> >
> > Yes, that what I am trying to do
>
> Well, please let me make some comments. I've spent many days and month
> optimizing Schur complement iterations, and I ended giving up. I was
> never able to get it perform better than ASM preconditioner (iff
> appropriatelly used, ie. solving local problems with LU, and
> implementing subdomain subpartitioning the smart way, not the way
> currently implemented in PETSc, were subpartitioning is done by chunks
> of continuous rows).
>
> If you are doing research on this, I would love to know your
> conclusion when you get your work done. If you are doing all this just
> with the hope of getting better running times, well, remember my above
> comments but also remember that I do not consider myself a smart guy
> ;-)
>
> As I said before, I worked hard for implementing general Schur
> complement iteration. All this code is avalable in the SVN repository
> of petsc4py (PETSc for Python), but it could be easily stripped out
> for use in any PETSc-based code in C/C++. This implementation requires
> the use of a MATIS matrix type (there is also a separate
> implementation for MATMPIAIJ maatrices), I've implemented subdomain
> subpartitioning (using a simple recursive graph splitting procedure
> reusing matrix reordering routines built-in in PETSc, could be done
> better with METIS); when the A_ii problems are large, their LU
> factorization can be a real bootleneck. I've even implemented a
> global preconditioner operation for the interface problem, based on
> iterating over a 'strip' of nodes around the interface; it improves
> convergence and is usefull for ill-conditioned systems, but the costs
> are increased.
>
> If you ever want to take a look at my implemention for try to use it,
> or perhaps take ideas for your own implementation, let me know.
>
>
>
>
>
> > > Now having the Schur complement matrix for each subdomain, I  
> need to solve
> > > the interface problem
> > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > > .. i=1.. to No. of process (subdomains) in parallel.
> > >
> > > For the global vector I construct one MPI vector and use  
> VecGetArray ()
> > for
> > > each of the sequential vector then use VecSetValues () to add  
> the values
> > > into the global MPI vector. That works fine.
> > >
> > > However for the global schur complement matix I try the same  
> idea by
> > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > > MatSetValues () in order to add the values to the global matrix.
> > > MatGetArray( ) gives me only the values without indices, so I  
> don't know
> > how
> > > to add these valuse to the global MPI matrix.
> > >
> > > Thanks agin
> > >
> > > Waad
> > >
> > > Barry Smith wrote:
> > >
> > > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> > >
> > > > Thank you Matt,
> > > >
> > > > Any suggestion to solve the problem I am trying to tackle. I  
> want to
> > > > solve a linear system:
> > > >
> > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > > >
> > > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > > vector. Each CPU has one matrix and one vector of the same  
> size. Now
> > > > I want to sum up and solve the system in parallel.
> > >
> > > Does each A_i have nonzero entries (mostly) associated with one
> > > part of the matrix? Or does each process have values
> > > scattered all around the matrix?
> > >
> > > In the former case you should simply create one parallel MPIAIJ
> > > matrix and call MatSetValues() to put the values
> > > into it. We don't have any kind of support for the later case,  
> perhaps
> > > if you describe how the matrix entries come about someone
> > > would have suggestions on how to proceed.
> > >
> > > Barry
> > >
> > > >
> > > >
> > > > Thanks again
> > > >
> > > > Waad
> > > >
> > > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > > 2:12 PM, Waad Subber wrote:
> > > > > Hi,
> > > > >
> > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > > adding up
> > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > > >
> > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt  
> m,PetscInt
> > > > n,MatReuse
> > > > > scall,Mat *mpimat)
> > > > >
> > > > > to do that. However, when I compile the code I get the  
> following
> > > > >
> > > > > undefined reference to `matmerge_seqstompi_'
> > > > > collect2: ld returned 1 exit status
> > > > > make: *** [all] Error 1
> > > > >
> > > > > Am I using this function correctly ?
> > > >
> > > > These have no Fortran bindings right now.
> > > >
> > > > Matt
> > > >
> > > > > Thanks
> > > > >
> > > > > Waad
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > What most experimenters take for granted before they begin their
> > > > experiments is infinitely more interesting than any results to  
> which
> > > > their experiments lead.
> > > > -- Norbert Wiener
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Lisandro Dalc?n
> > ---------------
> > Centro Internacional de M?todos Computacionales en Ingenier?a  
> (CIMEC)
> > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica  
> (INTEC)
> > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
> >
> >
> >
> >
>
>
> -- 
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
>


From w_subber at yahoo.com  Fri May 23 13:35:58 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Fri, 23 May 2008 11:35:58 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov>
Message-ID: <673926.73668.qm@web38205.mail.mud.yahoo.com>

Excellent...!  thank you Barry 

Waad :)

Barry Smith <bsmith at mcs.anl.gov> wrote: 
On May 23, 2008, at 11:20 AM, Waad Subber wrote:

> Thank you for the useful comments. For sure I will consider them.  
> I've just starting my research by writing a DDM substructuring code  
> which scales for now up-to 60 CPUs using petsc KSP solver for the  
> interface problem and Lapack direct factorization for the interior  
> problem. I split the domain using METIS library and assign each  
> subdomain to one process then solve the global Schur complement  
> using parallel preconditioned iterative solver. As an initial  
> attempt, I solved a 2D elasticity problem (about 100,000 DOFs)  
> within seconds using this algorithm. I notice Lapack solver for the  
> interior problem takes a lot of time compare to the iterative solver  
> for the interface, so now I am replacing the direct factorization  
> with petsc KSP solver.

    In my experience you will want to use a KSP direct solve, for  
example just -ksp_type preonly -pc_type lu; using an iterative solver  
in such Schur complement is
always way to slow.

   Barry

>
>
> I would like very much to have a look at your implementation, and I  
> think that will be very useful to me.
>
> Thanks
>
> Waad
>
> Lisandro Dalcin  wrote: On 5/20/08, Waad Subber  
> wrote:
> > 1) How do you actually get the local Schur complements. You
> > explicitelly compute its entries, or do you compute it after  
> computing
> > the inverse (or LU factors) of a 'local' matrix?
> >
> > I construct the local Schur complement matrices after getting the  
> inversion
> > of A_II matrix for each subdomain.
>
> Fine,
>
> > 2) Your R_i matrix is actually a matrix? In that case, it is a  
> trivial
> > restrinction operation with ones and zeros? Or R_i is actually a
> > VecScatter?
> >
> > R_i is the restriction matrix maps the global boundary nodes to  
> the local
> > boundary nodes and its entries is zero and one I store it as spare  
> matrix,
> > so only I need to store the nonzero entries which one entry per a  
> row
>
> I believe a VecScatter will perform much better for this task.
>
>
> > And finally: are you trying to apply a Krylov method over the global
> > Schur complement? In such a case, are you going to implement a
> > preconditioner for it?
> >
> > Yes, that what I am trying to do
>
> Well, please let me make some comments. I've spent many days and month
> optimizing Schur complement iterations, and I ended giving up. I was
> never able to get it perform better than ASM preconditioner (iff
> appropriatelly used, ie. solving local problems with LU, and
> implementing subdomain subpartitioning the smart way, not the way
> currently implemented in PETSc, were subpartitioning is done by chunks
> of continuous rows).
>
> If you are doing research on this, I would love to know your
> conclusion when you get your work done. If you are doing all this just
> with the hope of getting better running times, well, remember my above
> comments but also remember that I do not consider myself a smart guy
> ;-)
>
> As I said before, I worked hard for implementing general Schur
> complement iteration. All this code is avalable in the SVN repository
> of petsc4py (PETSc for Python), but it could be easily stripped out
> for use in any PETSc-based code in C/C++. This implementation requires
> the use of a MATIS matrix type (there is also a separate
> implementation for MATMPIAIJ maatrices), I've implemented subdomain
> subpartitioning (using a simple recursive graph splitting procedure
> reusing matrix reordering routines built-in in PETSc, could be done
> better with METIS); when the A_ii problems are large, their LU
> factorization can be a real bootleneck. I've even implemented a
> global preconditioner operation for the interface problem, based on
> iterating over a 'strip' of nodes around the interface; it improves
> convergence and is usefull for ill-conditioned systems, but the costs
> are increased.
>
> If you ever want to take a look at my implemention for try to use it,
> or perhaps take ideas for your own implementation, let me know.
>
>
>
>
>
> > > Now having the Schur complement matrix for each subdomain, I  
> need to solve
> > > the interface problem
> > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > > .. i=1.. to No. of process (subdomains) in parallel.
> > >
> > > For the global vector I construct one MPI vector and use  
> VecGetArray ()
> > for
> > > each of the sequential vector then use VecSetValues () to add  
> the values
> > > into the global MPI vector. That works fine.
> > >
> > > However for the global schur complement matix I try the same  
> idea by
> > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > > MatSetValues () in order to add the values to the global matrix.
> > > MatGetArray( ) gives me only the values without indices, so I  
> don't know
> > how
> > > to add these valuse to the global MPI matrix.
> > >
> > > Thanks agin
> > >
> > > Waad
> > >
> > > Barry Smith wrote:
> > >
> > > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> > >
> > > > Thank you Matt,
> > > >
> > > > Any suggestion to solve the problem I am trying to tackle. I  
> want to
> > > > solve a linear system:
> > > >
> > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > > >
> > > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > > vector. Each CPU has one matrix and one vector of the same  
> size. Now
> > > > I want to sum up and solve the system in parallel.
> > >
> > > Does each A_i have nonzero entries (mostly) associated with one
> > > part of the matrix? Or does each process have values
> > > scattered all around the matrix?
> > >
> > > In the former case you should simply create one parallel MPIAIJ
> > > matrix and call MatSetValues() to put the values
> > > into it. We don't have any kind of support for the later case,  
> perhaps
> > > if you describe how the matrix entries come about someone
> > > would have suggestions on how to proceed.
> > >
> > > Barry
> > >
> > > >
> > > >
> > > > Thanks again
> > > >
> > > > Waad
> > > >
> > > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > > 2:12 PM, Waad Subber wrote:
> > > > > Hi,
> > > > >
> > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > > adding up
> > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > > >
> > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt  
> m,PetscInt
> > > > n,MatReuse
> > > > > scall,Mat *mpimat)
> > > > >
> > > > > to do that. However, when I compile the code I get the  
> following
> > > > >
> > > > > undefined reference to `matmerge_seqstompi_'
> > > > > collect2: ld returned 1 exit status
> > > > > make: *** [all] Error 1
> > > > >
> > > > > Am I using this function correctly ?
> > > >
> > > > These have no Fortran bindings right now.
> > > >
> > > > Matt
> > > >
> > > > > Thanks
> > > > >
> > > > > Waad
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > What most experimenters take for granted before they begin their
> > > > experiments is infinitely more interesting than any results to  
> which
> > > > their experiments lead.
> > > > -- Norbert Wiener
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Lisandro Dalc?n
> > ---------------
> > Centro Internacional de M?todos Computacionales en Ingenier?a  
> (CIMEC)
> > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica  
> (INTEC)
> > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
> >
> >
> >
> >
>
>
> -- 
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/288ec700/attachment.htm>

From dalcinl at gmail.com  Fri May 23 13:45:22 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 23 May 2008 15:45:22 -0300
Subject: MatMerge_SeqsToMPI
In-Reply-To: <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov>
References: <212555.70047.qm@web38205.mail.mud.yahoo.com>
	 <1C08BF04-9463-465A-B926-BD1D5DA7122E@mcs.anl.gov>
Message-ID: <e7ba66e40805231145u38e0085es8771f4ba9f24715f@mail.gmail.com>

On 5/23/08, Barry Smith <bsmith at mcs.anl.gov> wrote:
> I notice Lapack solver for the interior
> problem takes a lot of time compare to the iterative solver for the
> interface, so now I am replacing the direct factorization with petsc KSP
> solver.
> >
>
>    In my experience you will want to use a KSP direct solve, for example
> just -ksp_type preonly -pc_type lu; using an iterative solver in such Schur
> complement is
>  always way to slow.

Indeed!!

Waad, do not do that. If your interior problem is big, the only way to
go is to implement subdomain subpartitioning. That is, in your local
subdomain, you have to select some DOF's and 'mark' them as
'interface' nodes. Then you end-up having many A_ii's at a local
subdomain, each corresponding with a sub-subdomain... Of course,
implementing all this logic is not trivial at all..

My implementation of all this stuff can be found in petsc4py SVN
repository hosted at Google Code. Here you have the (long) link:

http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/


In directory 'schur' you have a version requiring MATIS matrix type
(this is somewhat natural in the context of substructuring and finite
elements methods). This corresponds to what Y.Sadd's book calls
'edge-based' partitionings.

In directory 'schur_aij' hou have a version (never carefully tested)
working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's
book calls 'vertex-based' partitionings (more typical to appear in
finite difference methods).


> > I would like very much to have a look at your implementation, and I think
> that will be very useful to me.
> >
> > Thanks
> >
> > Waad
> >
> > Lisandro Dalcin <dalcinl at gmail.com> wrote: On 5/20/08, Waad Subber wrote:
> > > 1) How do you actually get the local Schur complements. You
> > > explicitelly compute its entries, or do you compute it after computing
> > > the inverse (or LU factors) of a 'local' matrix?
> > >
> > > I construct the local Schur complement matrices after getting the
> inversion
> > > of A_II matrix for each subdomain.
> >
> > Fine,
> >
> > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial
> > > restrinction operation with ones and zeros? Or R_i is actually a
> > > VecScatter?
> > >
> > > R_i is the restriction matrix maps the global boundary nodes to the
> local
> > > boundary nodes and its entries is zero and one I store it as spare
> matrix,
> > > so only I need to store the nonzero entries which one entry per a row
> >
> > I believe a VecScatter will perform much better for this task.
> >
> >
> > > And finally: are you trying to apply a Krylov method over the global
> > > Schur complement? In such a case, are you going to implement a
> > > preconditioner for it?
> > >
> > > Yes, that what I am trying to do
> >
> > Well, please let me make some comments. I've spent many days and month
> > optimizing Schur complement iterations, and I ended giving up. I was
> > never able to get it perform better than ASM preconditioner (iff
> > appropriatelly used, ie. solving local problems with LU, and
> > implementing subdomain subpartitioning the smart way, not the way
> > currently implemented in PETSc, were subpartitioning is done by chunks
> > of continuous rows).
> >
> > If you are doing research on this, I would love to know your
> > conclusion when you get your work done. If you are doing all this just
> > with the hope of getting better running times, well, remember my above
> > comments but also remember that I do not consider myself a smart guy
> > ;-)
> >
> > As I said before, I worked hard for implementing general Schur
> > complement iteration. All this code is avalable in the SVN repository
> > of petsc4py (PETSc for Python), but it could be easily stripped out
> > for use in any PETSc-based code in C/C++. This implementation requires
> > the use of a MATIS matrix type (there is also a separate
> > implementation for MATMPIAIJ maatrices), I've implemented subdomain
> > subpartitioning (using a simple recursive graph splitting procedure
> > reusing matrix reordering routines built-in in PETSc, could be done
> > better with METIS); when the A_ii problems are large, their LU
> > factorization can be a real bootleneck. I've even implemented a
> > global preconditioner operation for the interface problem, based on
> > iterating over a 'strip' of nodes around the interface; it improves
> > convergence and is usefull for ill-conditioned systems, but the costs
> > are increased.
> >
> > If you ever want to take a look at my implemention for try to use it,
> > or perhaps take ideas for your own implementation, let me know.
> >
> >
> >
> >
> >
> > > > Now having the Schur complement matrix for each subdomain, I need to
> solve
> > > > the interface problem
> > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > > > .. i=1.. to No. of process (subdomains) in parallel.
> > > >
> > > > For the global vector I construct one MPI vector and use VecGetArray
> ()
> > > for
> > > > each of the sequential vector then use VecSetValues () to add the
> values
> > > > into the global MPI vector. That works fine.
> > > >
> > > > However for the global schur complement matix I try the same idea by
> > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > > > MatSetValues () in order to add the values to the global matrix.
> > > > MatGetArray( ) gives me only the values without indices, so I don't
> know
> > > how
> > > > to add these valuse to the global MPI matrix.
> > > >
> > > > Thanks agin
> > > >
> > > > Waad
> > > >
> > > > Barry Smith wrote:
> > > >
> > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> > > >
> > > > > Thank you Matt,
> > > > >
> > > > > Any suggestion to solve the problem I am trying to tackle. I want to
> > > > > solve a linear system:
> > > > >
> > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > > > >
> > > > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > > > vector. Each CPU has one matrix and one vector of the same size. Now
> > > > > I want to sum up and solve the system in parallel.
> > > >
> > > > Does each A_i have nonzero entries (mostly) associated with one
> > > > part of the matrix? Or does each process have values
> > > > scattered all around the matrix?
> > > >
> > > > In the former case you should simply create one parallel MPIAIJ
> > > > matrix and call MatSetValues() to put the values
> > > > into it. We don't have any kind of support for the later case, perhaps
> > > > if you describe how the matrix entries come about someone
> > > > would have suggestions on how to proceed.
> > > >
> > > > Barry
> > > >
> > > > >
> > > > >
> > > > > Thanks again
> > > > >
> > > > > Waad
> > > > >
> > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > > > 2:12 PM, Waad Subber wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > > > adding up
> > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > > > >
> > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > > > > n,MatReuse
> > > > > > scall,Mat *mpimat)
> > > > > >
> > > > > > to do that. However, when I compile the code I get the following
> > > > > >
> > > > > > undefined reference to `matmerge_seqstompi_'
> > > > > > collect2: ld returned 1 exit status
> > > > > > make: *** [all] Error 1
> > > > > >
> > > > > > Am I using this function correctly ?
> > > > >
> > > > > These have no Fortran bindings right now.
> > > > >
> > > > > Matt
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Waad
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > What most experimenters take for granted before they begin their
> > > > > experiments is infinitely more interesting than any results to which
> > > > > their experiments lead.
> > > > > -- Norbert Wiener
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Lisandro Dalc?n
> > > ---------------
> > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > Tel/Fax: +54-(0)342-451.1594
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Lisandro Dalc?n
> > ---------------
> > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
> >
> >
> >
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From w_subber at yahoo.com  Fri May 23 13:59:24 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Fri, 23 May 2008 11:59:24 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <e7ba66e40805231145u38e0085es8771f4ba9f24715f@mail.gmail.com>
Message-ID: <321455.84470.qm@web38207.mail.mud.yahoo.com>

Thank you Lisandro,

I like the idea of constructing Schur complement of Schur complement matrix. I will give it a try.

Thanks again for the suggestions and the link 

Waad :)

Lisandro Dalcin <dalcinl at gmail.com> wrote: On 5/23/08, Barry Smith  wrote:
> I notice Lapack solver for the interior
> problem takes a lot of time compare to the iterative solver for the
> interface, so now I am replacing the direct factorization with petsc KSP
> solver.
> >
>
>    In my experience you will want to use a KSP direct solve, for example
> just -ksp_type preonly -pc_type lu; using an iterative solver in such Schur
> complement is
>  always way to slow.

Indeed!!

Waad, do not do that. If your interior problem is big, the only way to
go is to implement subdomain subpartitioning. That is, in your local
subdomain, you have to select some DOF's and 'mark' them as
'interface' nodes. Then you end-up having many A_ii's at a local
subdomain, each corresponding with a sub-subdomain... Of course,
implementing all this logic is not trivial at all..

My implementation of all this stuff can be found in petsc4py SVN
repository hosted at Google Code. Here you have the (long) link:

http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/


In directory 'schur' you have a version requiring MATIS matrix type
(this is somewhat natural in the context of substructuring and finite
elements methods). This corresponds to what Y.Sadd's book calls
'edge-based' partitionings.

In directory 'schur_aij' hou have a version (never carefully tested)
working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's
book calls 'vertex-based' partitionings (more typical to appear in
finite difference methods).


> > I would like very much to have a look at your implementation, and I think
> that will be very useful to me.
> >
> > Thanks
> >
> > Waad
> >
> > Lisandro Dalcin  wrote: On 5/20/08, Waad Subber wrote:
> > > 1) How do you actually get the local Schur complements. You
> > > explicitelly compute its entries, or do you compute it after computing
> > > the inverse (or LU factors) of a 'local' matrix?
> > >
> > > I construct the local Schur complement matrices after getting the
> inversion
> > > of A_II matrix for each subdomain.
> >
> > Fine,
> >
> > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial
> > > restrinction operation with ones and zeros? Or R_i is actually a
> > > VecScatter?
> > >
> > > R_i is the restriction matrix maps the global boundary nodes to the
> local
> > > boundary nodes and its entries is zero and one I store it as spare
> matrix,
> > > so only I need to store the nonzero entries which one entry per a row
> >
> > I believe a VecScatter will perform much better for this task.
> >
> >
> > > And finally: are you trying to apply a Krylov method over the global
> > > Schur complement? In such a case, are you going to implement a
> > > preconditioner for it?
> > >
> > > Yes, that what I am trying to do
> >
> > Well, please let me make some comments. I've spent many days and month
> > optimizing Schur complement iterations, and I ended giving up. I was
> > never able to get it perform better than ASM preconditioner (iff
> > appropriatelly used, ie. solving local problems with LU, and
> > implementing subdomain subpartitioning the smart way, not the way
> > currently implemented in PETSc, were subpartitioning is done by chunks
> > of continuous rows).
> >
> > If you are doing research on this, I would love to know your
> > conclusion when you get your work done. If you are doing all this just
> > with the hope of getting better running times, well, remember my above
> > comments but also remember that I do not consider myself a smart guy
> > ;-)
> >
> > As I said before, I worked hard for implementing general Schur
> > complement iteration. All this code is avalable in the SVN repository
> > of petsc4py (PETSc for Python), but it could be easily stripped out
> > for use in any PETSc-based code in C/C++. This implementation requires
> > the use of a MATIS matrix type (there is also a separate
> > implementation for MATMPIAIJ maatrices), I've implemented subdomain
> > subpartitioning (using a simple recursive graph splitting procedure
> > reusing matrix reordering routines built-in in PETSc, could be done
> > better with METIS); when the A_ii problems are large, their LU
> > factorization can be a real bootleneck. I've even implemented a
> > global preconditioner operation for the interface problem, based on
> > iterating over a 'strip' of nodes around the interface; it improves
> > convergence and is usefull for ill-conditioned systems, but the costs
> > are increased.
> >
> > If you ever want to take a look at my implemention for try to use it,
> > or perhaps take ideas for your own implementation, let me know.
> >
> >
> >
> >
> >
> > > > Now having the Schur complement matrix for each subdomain, I need to
> solve
> > > > the interface problem
> > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > > > .. i=1.. to No. of process (subdomains) in parallel.
> > > >
> > > > For the global vector I construct one MPI vector and use VecGetArray
> ()
> > > for
> > > > each of the sequential vector then use VecSetValues () to add the
> values
> > > > into the global MPI vector. That works fine.
> > > >
> > > > However for the global schur complement matix I try the same idea by
> > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > > > MatSetValues () in order to add the values to the global matrix.
> > > > MatGetArray( ) gives me only the values without indices, so I don't
> know
> > > how
> > > > to add these valuse to the global MPI matrix.
> > > >
> > > > Thanks agin
> > > >
> > > > Waad
> > > >
> > > > Barry Smith wrote:
> > > >
> > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> > > >
> > > > > Thank you Matt,
> > > > >
> > > > > Any suggestion to solve the problem I am trying to tackle. I want to
> > > > > solve a linear system:
> > > > >
> > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > > > >
> > > > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > > > vector. Each CPU has one matrix and one vector of the same size. Now
> > > > > I want to sum up and solve the system in parallel.
> > > >
> > > > Does each A_i have nonzero entries (mostly) associated with one
> > > > part of the matrix? Or does each process have values
> > > > scattered all around the matrix?
> > > >
> > > > In the former case you should simply create one parallel MPIAIJ
> > > > matrix and call MatSetValues() to put the values
> > > > into it. We don't have any kind of support for the later case, perhaps
> > > > if you describe how the matrix entries come about someone
> > > > would have suggestions on how to proceed.
> > > >
> > > > Barry
> > > >
> > > > >
> > > > >
> > > > > Thanks again
> > > > >
> > > > > Waad
> > > > >
> > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > > > 2:12 PM, Waad Subber wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > > > adding up
> > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > > > >
> > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > > > > n,MatReuse
> > > > > > scall,Mat *mpimat)
> > > > > >
> > > > > > to do that. However, when I compile the code I get the following
> > > > > >
> > > > > > undefined reference to `matmerge_seqstompi_'
> > > > > > collect2: ld returned 1 exit status
> > > > > > make: *** [all] Error 1
> > > > > >
> > > > > > Am I using this function correctly ?
> > > > >
> > > > > These have no Fortran bindings right now.
> > > > >
> > > > > Matt
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Waad
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > What most experimenters take for granted before they begin their
> > > > > experiments is infinitely more interesting than any results to which
> > > > > their experiments lead.
> > > > > -- Norbert Wiener
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Lisandro Dalc?n
> > > ---------------
> > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > Tel/Fax: +54-(0)342-451.1594
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Lisandro Dalc?n
> > ---------------
> > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
> >
> >
> >
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/bfb3f7dc/attachment.htm>

From dalcinl at gmail.com  Fri May 23 15:04:51 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 23 May 2008 17:04:51 -0300
Subject: MatMerge_SeqsToMPI
In-Reply-To: <321455.84470.qm@web38207.mail.mud.yahoo.com>
References: <e7ba66e40805231145u38e0085es8771f4ba9f24715f@mail.gmail.com>
	 <321455.84470.qm@web38207.mail.mud.yahoo.com>
Message-ID: <e7ba66e40805231304h6f140690se785e3e3b38c4d2@mail.gmail.com>

On 5/23/08, Waad Subber <w_subber at yahoo.com> wrote:
> Thank you Lisandro,
>
> I like the idea of constructing Schur complement of Schur complement matrix.
> I will give it a try.

Mmm, I was not actually talking about a Schur complemt of a Schur
complement. In fact, I suggested that the interface nodes have to be
extended with carefully selected interior nodes. This way, all this is
equivalent to solving a problem with, let say, 1000 'logical'
subdomains, but using let say 100 processors, were the actual local
subdomains are splited to have about 10 subdomains per processor. Am I
being clear enough?

>
> Thanks again for the suggestions and the link

Enjoy! And please do not blame me for such a contrived code!

>
> Lisandro Dalcin <dalcinl at gmail.com> wrote:
>  On 5/23/08, Barry Smith wrote:
> > I notice Lapack solver for the interior
> > problem takes a lot of time compare to the iterative solver for the
> > interface, so now I am replacing the direct factorization with petsc KSP
> > solver.
> > >
> >
> > In my experience you will want to use a KSP direct solve, for example
> > just -ksp_type preonly -pc_type lu; using an iterative solver in such
> Schur
> > complement is
> > always way to slow.
>
> Indeed!!
>
> Waad, do not do that. If your interior problem is big, the only way to
> go is to implement subdomain subpartitioning. That is, in your local
> subdomain, you have to select some DOF's and 'mark' them as
> 'interface' nodes. Then you end-up having many A_ii's at a local
> subdomain, each corresponding with a sub-subdomain... Of course,
> implementing all this logic is not trivial at all..
>
> My implementation of all this stuff can be found in petsc4py SVN
> repository hosted at Google Code. Here you have the (long) link:
>
> http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/
>
>
> In directory 'schur' you have a version requiring MATIS matrix type
> (this is somewhat natural in the context of substructuring and finite
> elements methods). This corresponds to what Y.Sadd's book calls
> 'edge-based' partitionings.
>
> In directory 'schur_aij' hou have a version (never carefully tested)
> working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's
> book calls 'vertex-based' partitionings (more typical to appear in
> finite difference methods).
>
>
>
>
>
>
> > > I would like very much to have a look at your implementation, and I
> think
> > that will be very useful to me.
> > >
> > > Thanks
> > >
> > > Waad
> > >
> > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote:
> > > > 1) How do you actually get the local Schur complements. You
> > > > explicitelly compute its entries, or do you compute it after computing
> > > > the inverse (or LU factors) of a 'local' matrix?
> > > >
> > > > I construct the local Schur complement matrices after getting the
> > inversion
> > > > of A_II matrix for each subdomain.
> > >
> > > Fine,
> > >
> > > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial
> > > > restrinction operation with ones and zeros? Or R_i is actually a
> > > > VecScatter?
> > > >
> > > > R_i is the restriction matrix maps the global boundary nodes to the
> > local
> > > > boundary nodes and its entries is zero and one I store it as spare
> > matrix,
> > > > so only I need to store the nonzero entries which one entry per a row
> > >
> > > I believe a VecScatter will perform much better for this task.
> > >
> > >
> > > > And finally: are you trying to apply a Krylov method over the global
> > > > Schur complement? In such a case, are you going to implement a
> > > > preconditioner for it?
> > > >
> > > > Yes, that what I am trying to do
> > >
> > > Well, please let me make some comments. I've spent many days and month
> > > optimizing Schur complement iterations, and I ended giving up. I was
> > > never able to get it perform better than ASM preconditioner (iff
> > > appropriatelly used, ie. solving local problems with LU, and
> > > implementing subdomain subpartitioning the smart way, not the way
> > > currently implemented in PETSc, were subpartitioning is done by chunks
> > > of continuous rows).
> > >
> > > If you are doing research on this, I would love to know your
> > > conclusion when you get your work done. If you are doing all this just
> > > with the hope of getting better running times, well, remember my above
> > > comments but also remember that I do not consider myself a smart guy
> > > ;-)
> > >
> > > As I said before, I worked hard for implementing general Schur
> > > complement iteration. All this code is avalable in the SVN repository
> > > of petsc4py (PETSc for Python), but it could be easily stripped out
> > > for use in any PETSc-based code in C/C++. This implementation requires
> > > the use of a MATIS matrix type (there is also a separate
> > > implementation for MATMPIAIJ maatrices), I've implemented subdomain
> > > subpartitioning (using a simple recursive graph splitting procedure
> > > reusing matrix reordering routines built-in in PETSc, could be done
> > > better with METIS); when the A_ii problems are large, their LU
> > > factorization can be a real bootleneck. I've even implemented a
> > > global preconditioner operation for the interface problem, based on
> > > iterating over a 'strip' of nodes around the interface; it improves
> > > convergence and is usefull for ill-conditioned systems, but the costs
> > > are increased.
> > >
> > > If you ever want to take a look at my implemention for try to use it,
> > > or perhaps take ideas for your own implementation, let me know.
> > >
> > >
> > >
> > >
> > >
> > > > > Now having the Schur complement matrix for each subdomain, I need to
> > solve
> > > > > the interface problem
> > > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > > > > .. i=1.. to No. of process (subdomains) in parallel.
> > > > >
> > > > > For the global vector I construct one MPI vector and use VecGetArray
> > ()
> > > > for
> > > > > each of the sequential vector then use VecSetValues () to add the
> > values
> > > > > into the global MPI vector. That works fine.
> > > > >
> > > > > However for the global schur complement matix I try the same idea by
> > > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > > > > MatSetValues () in order to add the values to the global matrix.
> > > > > MatGetArray( ) gives me only the values without indices, so I don't
> > know
> > > > how
> > > > > to add these valuse to the global MPI matrix.
> > > > >
> > > > > Thanks agin
> > > > >
> > > > > Waad
> > > > >
> > > > > Barry Smith wrote:
> > > > >
> > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> > > > >
> > > > > > Thank you Matt,
> > > > > >
> > > > > > Any suggestion to solve the problem I am trying to tackle. I want
> to
> > > > > > solve a linear system:
> > > > > >
> > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > > > > >
> > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > > > > vector. Each CPU has one matrix and one vector of the same size.
> Now
> > > > > > I want to sum up and solve the system in parallel.
> > > > >
> > > > > Does each A_i have nonzero entries (mostly) associated with one
> > > > > part of the matrix? Or does each process have values
> > > > > scattered all around the matrix?
> > > > >
> > > > > In the former case you should simply create one parallel MPIAIJ
> > > > > matrix and call MatSetValues() to put the values
> > > > > into it. We don't have any kind of support for the later case,
> perhaps
> > > > > if you describe how the matrix entries come about someone
> > > > > would have suggestions on how to proceed.
> > > > >
> > > > > Barry
> > > > >
> > > > > >
> > > > > >
> > > > > > Thanks again
> > > > > >
> > > > > > Waad
> > > > > >
> > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > > > > 2:12 PM, Waad Subber wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > > > > adding up
> > > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > > > > >
> > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > > > > > n,MatReuse
> > > > > > > scall,Mat *mpimat)
> > > > > > >
> > > > > > > to do that. However, when I compile the code I get the following
> > > > > > >
> > > > > > > undefined reference to `matmerge_seqstompi_'
> > > > > > > collect2: ld returned 1 exit status
> > > > > > > make: *** [all] Error 1
> > > > > > >
> > > > > > > Am I using this function correctly ?
> > > > > >
> > > > > > These have no Fortran bindings right now.
> > > > > >
> > > > > > Matt
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Waad
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > What most experimenters take for granted before they begin their
> > > > > > experiments is infinitely more interesting than any results to
> which
> > > > > > their experiments lead.
> > > > > > -- Norbert Wiener
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Lisandro Dalc?n
> > > > ---------------
> > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > Tel/Fax: +54-(0)342-451.1594
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Lisandro Dalc?n
> > > ---------------
> > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > Tel/Fax: +54-(0)342-451.1594
> > >
> > >
> > >
> > >
> >
> >
>
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From w_subber at yahoo.com  Fri May 23 15:30:40 2008
From: w_subber at yahoo.com (Waad Subber)
Date: Fri, 23 May 2008 13:30:40 -0700 (PDT)
Subject: MatMerge_SeqsToMPI
In-Reply-To: <e7ba66e40805231304h6f140690se785e3e3b38c4d2@mail.gmail.com>
Message-ID: <790235.898.qm@web38206.mail.mud.yahoo.com>

I see what do you mean

Thanks :)

Lisandro Dalcin <dalcinl at gmail.com> wrote: On 5/23/08, Waad Subber  wrote:
> Thank you Lisandro,
>
> I like the idea of constructing Schur complement of Schur complement matrix.
> I will give it a try.

Mmm, I was not actually talking about a Schur complemt of a Schur
complement. In fact, I suggested that the interface nodes have to be
extended with carefully selected interior nodes. This way, all this is
equivalent to solving a problem with, let say, 1000 'logical'
subdomains, but using let say 100 processors, were the actual local
subdomains are splited to have about 10 subdomains per processor. Am I
being clear enough?

>
> Thanks again for the suggestions and the link

Enjoy! And please do not blame me for such a contrived code!

>
> Lisandro Dalcin  wrote:
>  On 5/23/08, Barry Smith wrote:
> > I notice Lapack solver for the interior
> > problem takes a lot of time compare to the iterative solver for the
> > interface, so now I am replacing the direct factorization with petsc KSP
> > solver.
> > >
> >
> > In my experience you will want to use a KSP direct solve, for example
> > just -ksp_type preonly -pc_type lu; using an iterative solver in such
> Schur
> > complement is
> > always way to slow.
>
> Indeed!!
>
> Waad, do not do that. If your interior problem is big, the only way to
> go is to implement subdomain subpartitioning. That is, in your local
> subdomain, you have to select some DOF's and 'mark' them as
> 'interface' nodes. Then you end-up having many A_ii's at a local
> subdomain, each corresponding with a sub-subdomain... Of course,
> implementing all this logic is not trivial at all..
>
> My implementation of all this stuff can be found in petsc4py SVN
> repository hosted at Google Code. Here you have the (long) link:
>
> http://petsc4py.googlecode.com/svn/trunk/petsc/lib/ext/petsc/src/ksp/pc/impls/
>
>
> In directory 'schur' you have a version requiring MATIS matrix type
> (this is somewhat natural in the context of substructuring and finite
> elements methods). This corresponds to what Y.Sadd's book calls
> 'edge-based' partitionings.
>
> In directory 'schur_aij' hou have a version (never carefully tested)
> working with MATMPIAIJ matrices. This corresponds to what Y.Sadd's
> book calls 'vertex-based' partitionings (more typical to appear in
> finite difference methods).
>
>
>
>
>
>
> > > I would like very much to have a look at your implementation, and I
> think
> > that will be very useful to me.
> > >
> > > Thanks
> > >
> > > Waad
> > >
> > > Lisandro Dalcin wrote: On 5/20/08, Waad Subber wrote:
> > > > 1) How do you actually get the local Schur complements. You
> > > > explicitelly compute its entries, or do you compute it after computing
> > > > the inverse (or LU factors) of a 'local' matrix?
> > > >
> > > > I construct the local Schur complement matrices after getting the
> > inversion
> > > > of A_II matrix for each subdomain.
> > >
> > > Fine,
> > >
> > > > 2) Your R_i matrix is actually a matrix? In that case, it is a trivial
> > > > restrinction operation with ones and zeros? Or R_i is actually a
> > > > VecScatter?
> > > >
> > > > R_i is the restriction matrix maps the global boundary nodes to the
> > local
> > > > boundary nodes and its entries is zero and one I store it as spare
> > matrix,
> > > > so only I need to store the nonzero entries which one entry per a row
> > >
> > > I believe a VecScatter will perform much better for this task.
> > >
> > >
> > > > And finally: are you trying to apply a Krylov method over the global
> > > > Schur complement? In such a case, are you going to implement a
> > > > preconditioner for it?
> > > >
> > > > Yes, that what I am trying to do
> > >
> > > Well, please let me make some comments. I've spent many days and month
> > > optimizing Schur complement iterations, and I ended giving up. I was
> > > never able to get it perform better than ASM preconditioner (iff
> > > appropriatelly used, ie. solving local problems with LU, and
> > > implementing subdomain subpartitioning the smart way, not the way
> > > currently implemented in PETSc, were subpartitioning is done by chunks
> > > of continuous rows).
> > >
> > > If you are doing research on this, I would love to know your
> > > conclusion when you get your work done. If you are doing all this just
> > > with the hope of getting better running times, well, remember my above
> > > comments but also remember that I do not consider myself a smart guy
> > > ;-)
> > >
> > > As I said before, I worked hard for implementing general Schur
> > > complement iteration. All this code is avalable in the SVN repository
> > > of petsc4py (PETSc for Python), but it could be easily stripped out
> > > for use in any PETSc-based code in C/C++. This implementation requires
> > > the use of a MATIS matrix type (there is also a separate
> > > implementation for MATMPIAIJ maatrices), I've implemented subdomain
> > > subpartitioning (using a simple recursive graph splitting procedure
> > > reusing matrix reordering routines built-in in PETSc, could be done
> > > better with METIS); when the A_ii problems are large, their LU
> > > factorization can be a real bootleneck. I've even implemented a
> > > global preconditioner operation for the interface problem, based on
> > > iterating over a 'strip' of nodes around the interface; it improves
> > > convergence and is usefull for ill-conditioned systems, but the costs
> > > are increased.
> > >
> > > If you ever want to take a look at my implemention for try to use it,
> > > or perhaps take ideas for your own implementation, let me know.
> > >
> > >
> > >
> > >
> > >
> > > > > Now having the Schur complement matrix for each subdomain, I need to
> > solve
> > > > > the interface problem
> > > > (Sum[R_i^T*S_i*R_i])u=Sum[R_i^T*g_i],
> > > > > .. i=1.. to No. of process (subdomains) in parallel.
> > > > >
> > > > > For the global vector I construct one MPI vector and use VecGetArray
> > ()
> > > > for
> > > > > each of the sequential vector then use VecSetValues () to add the
> > values
> > > > > into the global MPI vector. That works fine.
> > > > >
> > > > > However for the global schur complement matix I try the same idea by
> > > > > creating one parallel MPIAIJ matrix and using MatGetArray( ) and
> > > > > MatSetValues () in order to add the values to the global matrix.
> > > > > MatGetArray( ) gives me only the values without indices, so I don't
> > know
> > > > how
> > > > > to add these valuse to the global MPI matrix.
> > > > >
> > > > > Thanks agin
> > > > >
> > > > > Waad
> > > > >
> > > > > Barry Smith wrote:
> > > > >
> > > > > On May 20, 2008, at 3:16 PM, Waad Subber wrote:
> > > > >
> > > > > > Thank you Matt,
> > > > > >
> > > > > > Any suggestion to solve the problem I am trying to tackle. I want
> to
> > > > > > solve a linear system:
> > > > > >
> > > > > > Sum(A_i) u= Sum(f_i) , i=1.... to No. of CPUs.
> > > > > >
> > > > > > Where A_i is a sparse sequential matrix and f_i is a sequential
> > > > > > vector. Each CPU has one matrix and one vector of the same size.
> Now
> > > > > > I want to sum up and solve the system in parallel.
> > > > >
> > > > > Does each A_i have nonzero entries (mostly) associated with one
> > > > > part of the matrix? Or does each process have values
> > > > > scattered all around the matrix?
> > > > >
> > > > > In the former case you should simply create one parallel MPIAIJ
> > > > > matrix and call MatSetValues() to put the values
> > > > > into it. We don't have any kind of support for the later case,
> perhaps
> > > > > if you describe how the matrix entries come about someone
> > > > > would have suggestions on how to proceed.
> > > > >
> > > > > Barry
> > > > >
> > > > > >
> > > > > >
> > > > > > Thanks again
> > > > > >
> > > > > > Waad
> > > > > >
> > > > > > Matthew Knepley wrote: On Tue, May 20, 2008 at
> > > > > > 2:12 PM, Waad Subber wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am trying to construct a sparse parallel matrix (MPIAIJ) by
> > > > > > adding up
> > > > > > > sparse sequential matrices (SeqAIJ) from each CPU. I am using
> > > > > > >
> > > > > > > MatMerge_SeqsToMPI(MPI_Comm comm,Mat seqmat,PetscInt m,PetscInt
> > > > > > n,MatReuse
> > > > > > > scall,Mat *mpimat)
> > > > > > >
> > > > > > > to do that. However, when I compile the code I get the following
> > > > > > >
> > > > > > > undefined reference to `matmerge_seqstompi_'
> > > > > > > collect2: ld returned 1 exit status
> > > > > > > make: *** [all] Error 1
> > > > > > >
> > > > > > > Am I using this function correctly ?
> > > > > >
> > > > > > These have no Fortran bindings right now.
> > > > > >
> > > > > > Matt
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Waad
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > What most experimenters take for granted before they begin their
> > > > > > experiments is infinitely more interesting than any results to
> which
> > > > > > their experiments lead.
> > > > > > -- Norbert Wiener
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Lisandro Dalc?n
> > > > ---------------
> > > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > Tel/Fax: +54-(0)342-451.1594
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Lisandro Dalc?n
> > > ---------------
> > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > Tel/Fax: +54-(0)342-451.1594
> > >
> > >
> > >
> > >
> >
> >
>
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/0264f090/attachment.htm>

From zonexo at gmail.com  Sun May 25 05:43:19 2008
From: zonexo at gmail.com (Ben Tay)
Date: Sun, 25 May 2008 18:43:19 +0800
Subject: Comparing matrices between 2 different codes and viewing of matrix
Message-ID: <483942C7.7080705@gmail.com>

Hi,

I have an old serial code and a newer parallel code. The new parallel 
code is converted from the old serial code. However, due to numerous 
changes, the answers from the new code now differs from the old one 
after the 1st step. What is the best way to compare the matrices from 
the 2 different code?

I guess the most direct mtd is to use MatView to store the matrix in a 
ACSII file and spot the difference between the 2 files. However, I can't 
seem to get it right. What I did is:

PetscViewer viewer

call  PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr)

call MatView(A_mat_uv,viewer,ierr)

call PetscViewerDestroy(viewer,ierr)

call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr)

However, I get the error that "PetscViewer viewer" has syntax error. 
Hope you can help me out.

Thank you very much.

Regards


From bsmith at mcs.anl.gov  Sun May 25 08:55:12 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 25 May 2008 08:55:12 -0500
Subject: Comparing matrices between 2 different codes and viewing of matrix
In-Reply-To: <483942C7.7080705@gmail.com>
References: <483942C7.7080705@gmail.com>
Message-ID: <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov>


    Likely you forgot to #include "finclude/petscviewer.h" in the  
Fortran subroutine that
is doing this stuff.


On May 25, 2008, at 5:43 AM, Ben Tay wrote:

> Hi,
>
> I have an old serial code and a newer parallel code. The new  
> parallel code is converted from the old serial code. However, due to  
> numerous changes, the answers from the new code now differs from the  
> old one after the 1st step. What is the best way to compare the  
> matrices from the 2 different code?
>
> I guess the most direct mtd is to use MatView to store the matrix in  
> a ACSII file and spot the difference between the 2 files. However, I  
> can't seem to get it right. What I did is:
>
> PetscViewer viewer
>
> call  PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr)
>
> call MatView(A_mat_uv,viewer,ierr)
>
> call PetscViewerDestroy(viewer,ierr)
>
> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr)
>
> However, I get the error that "PetscViewer viewer" has syntax error.  
> Hope you can help me out.
>
> Thank you very much.
>
> Regards
>
>


From dalcinl at gmail.com  Mon May 26 10:47:11 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Mon, 26 May 2008 12:47:11 -0300
Subject: Comparing matrices between 2 different codes and viewing of matrix
In-Reply-To: <483942C7.7080705@gmail.com>
References: <483942C7.7080705@gmail.com>
Message-ID: <e7ba66e40805260847j6bf4094wbe7ae651ec18e4ba@mail.gmail.com>

On 5/25/08, Ben Tay <zonexo at gmail.com> wrote:
> What is the best way to compare the matrices from the 2 different code?
>
>  I guess the most direct mtd is to use MatView to store the matrix in a
> ACSII file and spot the difference between the 2 files. However, I can't
> seem to get it right. What I did is:

But then take into account that in the parallel case, the numbering of
the rows/cols of the matrix will perhaps not match the one of the
sequential case. If you save the parallel matrix, you will also need
to save an appropriate permutation to be able to actually compare the
matrices. Or perhaps better, a combination of an application ordering,
an index set, and MatPermute().


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From zonexo at gmail.com  Tue May 27 05:57:08 2008
From: zonexo at gmail.com (Ben Tay)
Date: Tue, 27 May 2008 18:57:08 +0800
Subject: Comparing matrices between 2 different codes and viewing of matrix
In-Reply-To: <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov>
References: <483942C7.7080705@gmail.com> <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov>
Message-ID: <483BE904.4090608@gmail.com>

Hi Barry,

Thanks for pointing out. However, I only got a 0 byte file after adding 
the include statement.

I am programming in parallel and my matrix is created using 
MatCreateMPIAIJ. Did I missed out some commands?

PetscViewer viewer


...

call  PetscViewerCreate(MPI_COMM_WORLD,viewer,ierr)

call MatView(A_mat_uv,viewer,ierr)

call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr)

call PetscViewerDestroy(viewer,ierr)

Thank you very much.

Regards.


Barry Smith wrote:
>
>    Likely you forgot to #include "finclude/petscviewer.h" in the 
> Fortran subroutine that
> is doing this stuff.
>
>
> On May 25, 2008, at 5:43 AM, Ben Tay wrote:
>
>> Hi,
>>
>> I have an old serial code and a newer parallel code. The new parallel 
>> code is converted from the old serial code. However, due to numerous 
>> changes, the answers from the new code now differs from the old one 
>> after the 1st step. What is the best way to compare the matrices from 
>> the 2 different code?
>>
>> I guess the most direct mtd is to use MatView to store the matrix in 
>> a ACSII file and spot the difference between the 2 files. However, I 
>> can't seem to get it right. What I did is:
>>
>> PetscViewer viewer
>>
>> call  PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr)
>>
>> call MatView(A_mat_uv,viewer,ierr)
>>
>> call PetscViewerDestroy(viewer,ierr)
>>
>> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr)
>>
>> However, I get the error that "PetscViewer viewer" has syntax error. 
>> Hope you can help me out.
>>
>> Thank you very much.
>>
>> Regards
>>
>>
>
>


From bsmith at mcs.anl.gov  Tue May 27 07:52:01 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 27 May 2008 07:52:01 -0500
Subject: Comparing matrices between 2 different codes and viewing of matrix
In-Reply-To: <483BE904.4090608@gmail.com>
References: <483942C7.7080705@gmail.com> <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> <483BE904.4090608@gmail.com>
Message-ID: <F2D29B01-F68A-44A6-B27C-DA9032DD78B7@mcs.anl.gov>


    You use EITHER PetscViewerCreate() and PetscViewerSetType() and  
PetscViewerFileSetName() OR PetscViewerASCIIOpen()
so use


> call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr)
>> call MatView(A_mat_uv,viewer,ierr)
> call PetscViewerDestroy(viewer,ierr)


On May 27, 2008, at 5:57 AM, Ben Tay wrote:

> Hi Barry,
>
> Thanks for pointing out. However, I only got a 0 byte file after  
> adding the include statement.
>
> I am programming in parallel and my matrix is created using  
> MatCreateMPIAIJ. Did I missed out some commands?
>
> PetscViewer viewer
>
>
> ...
>
> call  PetscViewerCreate(MPI_COMM_WORLD,viewer,ierr)
>
> call MatView(A_mat_uv,viewer,ierr)
>
> call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr)
>
> call PetscViewerDestroy(viewer,ierr)
>
> Thank you very much.
>
> Regards.
>
>
> Barry Smith wrote:
>>
>>   Likely you forgot to #include "finclude/petscviewer.h" in the  
>> Fortran subroutine that
>> is doing this stuff.
>>
>>
>> On May 25, 2008, at 5:43 AM, Ben Tay wrote:
>>
>>> Hi,
>>>
>>> I have an old serial code and a newer parallel code. The new  
>>> parallel code is converted from the old serial code. However, due  
>>> to numerous changes, the answers from the new code now differs  
>>> from the old one after the 1st step. What is the best way to  
>>> compare the matrices from the 2 different code?
>>>
>>> I guess the most direct mtd is to use MatView to store the matrix  
>>> in a ACSII file and spot the difference between the 2 files.  
>>> However, I can't seem to get it right. What I did is:
>>>
>>> PetscViewer viewer
>>>
>>> call  PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr)
>>>
>>> call MatView(A_mat_uv,viewer,ierr)
>>>
>>> call PetscViewerDestroy(viewer,ierr)
>>>
>>> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr)
>>>
>>> However, I get the error that "PetscViewer viewer" has syntax  
>>> error. Hope you can help me out.
>>>
>>> Thank you very much.
>>>
>>> Regards
>>>
>>>
>>
>>
>
>


From Amit.Itagi at seagate.com  Tue May 27 10:18:40 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Tue, 27 May 2008 11:18:40 -0400
Subject: Code structuring - Communicator
In-Reply-To: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov>
Message-ID: <OFCAFD4AE8.2695859C-ON85257456.0052B4D6-85257456.00542024@seagate.com>

Barry,

I got a part of what I was trying to do (sub-communicator etc.), working.
Now suppose I want to repeat a calculation with a different input, I have
two ways of doing it (based on what I have coded).

1)

MPI_Initialize
Create a group using MPI_Comm_group
Create several sub-groups and sub-communicators using MPI_Group_Incl and
MPI_Comm_create
Assign the sub-communicator to PETSC_COMM_WORLD
// Calculation 1
{
Do PetscInitialize
Perform the calculation
Do PetscFinalize
}
// Calculation 2
{
Do PetscInitialize
Perform the calculation
Do PetscFinalize
}
Do MPI_finalize

2)

MPI_Initialize
Create a group using MPI_Comm_group
Create several sub-groups and sub-communicators using MPI_Group_Incl and
MPI_Comm_create
Assign the sub-communicator to PETSC_COMM_WORLD
Do PetscInitialize
// Calculation 1
{
Perform the calculation
}
// Calculation 2
{
Perform the calculation
}
Do PetscFinalize
Do MPI_finalize


The first method crashes. I am trying to understand why.  The documentation
says that PetscFinalize calls MPI_finalize only if MPI_Initialize is not
called before PetscInitialize. In my case, is PetscFinalize destroying the
sub-communicators ?

Thanks

Rgds,
Amit


             Barry Smith                                                   
             <bsmith at mcs.anl.g                                             
             ov>                                                        To 
             Sent by:                  petsc-users at mcs.anl.gov             
             owner-petsc-users                                          cc 
             @mcs.anl.gov                                                  
             No Phone Info                                         Subject 
             Available                 Re: Code structuring - Communicator 
                                                                           
                                                                           
             05/09/2008 03:07                                              
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             petsc-users at mcs.a                                             
                  nl.gov                                                   
                                                                           
                                                                           
    There are many ways to do this, most of them involve using MPI to
construct subcommunicators
for the various sub parallel tasks. You very likely want to keep
PetscInitialize() at
the very beginning of the program; you would not write the calls in
terms of
PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the
subcommunicators to create the objects.

    An alternative approach is to look at the manual page for
PetscOpenMPMerge(), PetscOpenMPRun(),
PetscOpenMPNew() in petsc-dev. These allow a simple master-worker
model of parallelism
with PETSc with a bunch of masters that can work together (instead of
just one master) and each
master controls a bunch of workers. The code in src/ksp/pc/impls/
openmp uses this code.

Note that OpenMP has NOTHING to do with OpenMP the standard. Also I
don't really have
any support for Fortran, I hope you use C/C++. Comments welcome. It
sounds like this matches
what you need. It's pretty cool,  but underdeveloped.

    Barry


On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote:

>
> Hi,
>
> I have a question about the Petsc communicator. I have a petsc program
> "foo" which essentially runs in parallel and gives me
> y=f(x1,x2,...), where
> y is an output parameter and xi's are input parameters. Suppose, I
> want to
> run a parallel optimizer for the input parameters. I am looking at the
> following functionality. I submit the optimizer job on 16 processors
> (using
> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of
> "foo", each running parallely on 4 processors. "foo" will be written
> as a
> function and not as a main program in this case. How can I get this
> functionality using Petsc ? Should PetscInitialize be called in the
> optimizer, or in each foo run ? If PetscInitialize is called in the
> optimizer, is there a way to make the foo function run only on a
> subset of
> the 16 processors ?
>
> May be, I haven't done a good job of explaining my problem. Let me
> know if
> you need any clarifications.
>
> Thanks
>
> Rgds,
> Amit
>


From knepley at gmail.com  Tue May 27 10:23:44 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 27 May 2008 10:23:44 -0500
Subject: Code structuring - Communicator
In-Reply-To: <OFCAFD4AE8.2695859C-ON85257456.0052B4D6-85257456.00542024@seagate.com>
References: <20A48E73-BDF6-4867-851B-311DBAD70844@mcs.anl.gov>
	 <OFCAFD4AE8.2695859C-ON85257456.0052B4D6-85257456.00542024@seagate.com>
Message-ID: <a9f269830805270823w6fee83b2s8a44f54edbc8a78a@mail.gmail.com>

On Tue, May 27, 2008 at 10:18 AM,  <Amit.Itagi at seagate.com> wrote:
> Barry,
>
> I got a part of what I was trying to do (sub-communicator etc.), working.
> Now suppose I want to repeat a calculation with a different input, I have
> two ways of doing it (based on what I have coded).
>
> 1)
>
> MPI_Initialize
> Create a group using MPI_Comm_group
> Create several sub-groups and sub-communicators using MPI_Group_Incl and
> MPI_Comm_create
> Assign the sub-communicator to PETSC_COMM_WORLD
> // Calculation 1
> {
> Do PetscInitialize
> Perform the calculation
> Do PetscFinalize
> }
> // Calculation 2
> {
> Do PetscInitialize
> Perform the calculation
> Do PetscFinalize
> }
> Do MPI_finalize
>
> 2)
>
> MPI_Initialize
> Create a group using MPI_Comm_group
> Create several sub-groups and sub-communicators using MPI_Group_Incl and
> MPI_Comm_create
> Assign the sub-communicator to PETSC_COMM_WORLD
> Do PetscInitialize
> // Calculation 1
> {
> Perform the calculation
> }
> // Calculation 2
> {
> Perform the calculation
> }
> Do PetscFinalize
> Do MPI_finalize
>
>
> The first method crashes. I am trying to understand why.  The documentation

What do you mean "crashes" and what line does it happen on? You can use
-start_in_debugger to get a stack trace. I do not completely understand your
pseudocode, however, you should never call PetscInitialize()/Finalize() more
than once.

   Matt

> says that PetscFinalize calls MPI_finalize only if MPI_Initialize is not
> called before PetscInitialize. In my case, is PetscFinalize destroying the
> sub-communicators ?
>
> Thanks
>
> Rgds,
> Amit
>
>
>
>
>
>             Barry Smith
>             <bsmith at mcs.anl.g
>             ov>                                                        To
>             Sent by:                  petsc-users at mcs.anl.gov
>             owner-petsc-users                                          cc
>             @mcs.anl.gov
>             No Phone Info                                         Subject
>             Available                 Re: Code structuring - Communicator
>
>
>             05/09/2008 03:07
>             PM
>
>
>             Please respond to
>             petsc-users at mcs.a
>                  nl.gov
>
>
>
>
>
>
>
>    There are many ways to do this, most of them involve using MPI to
> construct subcommunicators
> for the various sub parallel tasks. You very likely want to keep
> PetscInitialize() at
> the very beginning of the program; you would not write the calls in
> terms of
> PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the
> subcommunicators to create the objects.
>
>    An alternative approach is to look at the manual page for
> PetscOpenMPMerge(), PetscOpenMPRun(),
> PetscOpenMPNew() in petsc-dev. These allow a simple master-worker
> model of parallelism
> with PETSc with a bunch of masters that can work together (instead of
> just one master) and each
> master controls a bunch of workers. The code in src/ksp/pc/impls/
> openmp uses this code.
>
> Note that OpenMP has NOTHING to do with OpenMP the standard. Also I
> don't really have
> any support for Fortran, I hope you use C/C++. Comments welcome. It
> sounds like this matches
> what you need. It's pretty cool,  but underdeveloped.
>
>    Barry
>
>
>
> On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote:
>
>>
>> Hi,
>>
>> I have a question about the Petsc communicator. I have a petsc program
>> "foo" which essentially runs in parallel and gives me
>> y=f(x1,x2,...), where
>> y is an output parameter and xi's are input parameters. Suppose, I
>> want to
>> run a parallel optimizer for the input parameters. I am looking at the
>> following functionality. I submit the optimizer job on 16 processors
>> (using
>> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of
>> "foo", each running parallely on 4 processors. "foo" will be written
>> as a
>> function and not as a main program in this case. How can I get this
>> functionality using Petsc ? Should PetscInitialize be called in the
>> optimizer, or in each foo run ? If PetscInitialize is called in the
>> optimizer, is there a way to make the foo function run only on a
>> subset of
>> the 16 processors ?
>>
>> May be, I haven't done a good job of explaining my problem. Let me
>> know if
>> you need any clarifications.
>>
>> Thanks
>>
>> Rgds,
>> Amit
>>
>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From zonexo at gmail.com  Tue May 27 10:23:56 2008
From: zonexo at gmail.com (Ben Tay)
Date: Tue, 27 May 2008 23:23:56 +0800
Subject: Comparing matrices between 2 different codes and viewing of matrix
In-Reply-To: <F2D29B01-F68A-44A6-B27C-DA9032DD78B7@mcs.anl.gov>
References: <483942C7.7080705@gmail.com> <4C1BFC40-5235-42FF-8BE2-788FC9757B78@mcs.anl.gov> <483BE904.4090608@gmail.com> <F2D29B01-F68A-44A6-B27C-DA9032DD78B7@mcs.anl.gov>
Message-ID: <483C278C.4030604@gmail.com>

Hi Barry,

Got it working. Thanks alot!

Barry Smith wrote:
>
>    You use EITHER PetscViewerCreate() and PetscViewerSetType() and 
> PetscViewerFileSetName() OR PetscViewerASCIIOpen()
> so use
>
>
>> call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr)
>>> call MatView(A_mat_uv,viewer,ierr)
>> call PetscViewerDestroy(viewer,ierr)
>
>
>
> On May 27, 2008, at 5:57 AM, Ben Tay wrote:
>
>> Hi Barry,
>>
>> Thanks for pointing out. However, I only got a 0 byte file after 
>> adding the include statement.
>>
>> I am programming in parallel and my matrix is created using 
>> MatCreateMPIAIJ. Did I missed out some commands?
>>
>> PetscViewer viewer
>>
>>
>> ...
>>
>> call  PetscViewerCreate(MPI_COMM_WORLD,viewer,ierr)
>>
>> call MatView(A_mat_uv,viewer,ierr)
>>
>> call PetscViewerASCIIOpen(MPI_COMM_WORLD, "matrix.txt",viewer,ierr)
>>
>> call PetscViewerDestroy(viewer,ierr)
>>
>> Thank you very much.
>>
>> Regards.
>>
>>
>> Barry Smith wrote:
>>>
>>>   Likely you forgot to #include "finclude/petscviewer.h" in the 
>>> Fortran subroutine that
>>> is doing this stuff.
>>>
>>>
>>> On May 25, 2008, at 5:43 AM, Ben Tay wrote:
>>>
>>>> Hi,
>>>>
>>>> I have an old serial code and a newer parallel code. The new 
>>>> parallel code is converted from the old serial code. However, due 
>>>> to numerous changes, the answers from the new code now differs from 
>>>> the old one after the 1st step. What is the best way to compare the 
>>>> matrices from the 2 different code?
>>>>
>>>> I guess the most direct mtd is to use MatView to store the matrix 
>>>> in a ACSII file and spot the difference between the 2 files. 
>>>> However, I can't seem to get it right. What I did is:
>>>>
>>>> PetscViewer viewer
>>>>
>>>> call  PetscViewerCreate(PETSC_COMM_SELF,viewer,ierr)
>>>>
>>>> call MatView(A_mat_uv,viewer,ierr)
>>>>
>>>> call PetscViewerDestroy(viewer,ierr)
>>>>
>>>> call PetscViewerASCIIOpen(PETSC_COMM_SELF, "matrix.txt",viewer,ierr)
>>>>
>>>> However, I get the error that "PetscViewer viewer" has syntax 
>>>> error. Hope you can help me out.
>>>>
>>>> Thank you very much.
>>>>
>>>> Regards
>>>>
>>>>
>>>
>>>
>>
>>
>
>


From bsmith at mcs.anl.gov  Tue May 27 10:24:04 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 27 May 2008 10:24:04 -0500
Subject: Code structuring - Communicator
In-Reply-To: <OFCAFD4AE8.2695859C-ON85257456.0052B4D6-85257456.00542024@seagate.com>
References: <OFCAFD4AE8.2695859C-ON85257456.0052B4D6-85257456.00542024@seagate.com>
Message-ID: <F825A664-F5A1-4D31-98C9-38FE9D52FC50@mcs.anl.gov>


    You cannot call PetscInitialize() twice.

    Barry

On May 27, 2008, at 10:18 AM, Amit.Itagi at seagate.com wrote:

> Barry,
>
> I got a part of what I was trying to do (sub-communicator etc.),  
> working.
> Now suppose I want to repeat a calculation with a different input, I  
> have
> two ways of doing it (based on what I have coded).
>
> 1)
>
> MPI_Initialize
> Create a group using MPI_Comm_group
> Create several sub-groups and sub-communicators using MPI_Group_Incl  
> and
> MPI_Comm_create
> Assign the sub-communicator to PETSC_COMM_WORLD
> // Calculation 1
> {
> Do PetscInitialize
> Perform the calculation
> Do PetscFinalize
> }
> // Calculation 2
> {
> Do PetscInitialize
> Perform the calculation
> Do PetscFinalize
> }
> Do MPI_finalize
>
> 2)
>
> MPI_Initialize
> Create a group using MPI_Comm_group
> Create several sub-groups and sub-communicators using MPI_Group_Incl  
> and
> MPI_Comm_create
> Assign the sub-communicator to PETSC_COMM_WORLD
> Do PetscInitialize
> // Calculation 1
> {
> Perform the calculation
> }
> // Calculation 2
> {
> Perform the calculation
> }
> Do PetscFinalize
> Do MPI_finalize
>
>
> The first method crashes. I am trying to understand why.  The  
> documentation
> says that PetscFinalize calls MPI_finalize only if MPI_Initialize is  
> not
> called before PetscInitialize. In my case, is PetscFinalize  
> destroying the
> sub-communicators ?
>
> Thanks
>
> Rgds,
> Amit
>
>
>
>
>
>             Barry Smith
>             <bsmith at mcs.anl.g
>              
> ov>                                                        To
>             Sent by:                  petsc-users at mcs.anl.gov
>             owner-petsc- 
> users                                          cc
>             @mcs.anl.gov
>             No Phone Info                                          
> Subject
>             Available                 Re: Code structuring -  
> Communicator
>
>
>             05/09/2008 03:07
>             PM
>
>
>             Please respond to
>             petsc-users at mcs.a
>                  nl.gov
>
>
>
>
>
>
>
>    There are many ways to do this, most of them involve using MPI to
> construct subcommunicators
> for the various sub parallel tasks. You very likely want to keep
> PetscInitialize() at
> the very beginning of the program; you would not write the calls in
> terms of
> PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the
> subcommunicators to create the objects.
>
>    An alternative approach is to look at the manual page for
> PetscOpenMPMerge(), PetscOpenMPRun(),
> PetscOpenMPNew() in petsc-dev. These allow a simple master-worker
> model of parallelism
> with PETSc with a bunch of masters that can work together (instead of
> just one master) and each
> master controls a bunch of workers. The code in src/ksp/pc/impls/
> openmp uses this code.
>
> Note that OpenMP has NOTHING to do with OpenMP the standard. Also I
> don't really have
> any support for Fortran, I hope you use C/C++. Comments welcome. It
> sounds like this matches
> what you need. It's pretty cool,  but underdeveloped.
>
>    Barry
>
>
>
> On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote:
>
>>
>> Hi,
>>
>> I have a question about the Petsc communicator. I have a petsc  
>> program
>> "foo" which essentially runs in parallel and gives me
>> y=f(x1,x2,...), where
>> y is an output parameter and xi's are input parameters. Suppose, I
>> want to
>> run a parallel optimizer for the input parameters. I am looking at  
>> the
>> following functionality. I submit the optimizer job on 16 processors
>> (using
>> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs  
>> of
>> "foo", each running parallely on 4 processors. "foo" will be written
>> as a
>> function and not as a main program in this case. How can I get this
>> functionality using Petsc ? Should PetscInitialize be called in the
>> optimizer, or in each foo run ? If PetscInitialize is called in the
>> optimizer, is there a way to make the foo function run only on a
>> subset of
>> the 16 processors ?
>>
>> May be, I haven't done a good job of explaining my problem. Let me
>> know if
>> you need any clarifications.
>>
>> Thanks
>>
>> Rgds,
>> Amit
>>
>
>
>
>


From zonexo at gmail.com  Tue May 27 10:58:55 2008
From: zonexo at gmail.com (Ben Tay)
Date: Tue, 27 May 2008 23:58:55 +0800
Subject: Using MAT_NO_NEW_NONZERO_LOCATIONS and MAT_SYMMETRIC give error
Message-ID: <483C2FBF.90509@gmail.com>

Hi,

I read in the manual that I can use either call 
MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) or 
call MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr).

When I use MAT_NO_NEW_NONZERO_LOCATIONS for the matrix formed by my 
momentum eqn, I get the error on my home computer udring compiling:

:\Myprojects2\imb_airfoil_x2_parallel\mom_disz.f90(356) : Error: This 
name does not have a type, and must have an explicit type.   
[MAT_NO_NEW_NONZERO_LOCATIONS]
call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)

Although this error does not happen when I compile in linux, I can this 
error during run:

[0]PETSC ERROR: 
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC 
ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to 
find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message 
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR: 
------------------------------------------------------------------------

I use KSPRICHARDSON and PCILU for my momentum eqn matrix.

When I use MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr) for my 
poisson eqn which is symmetric, I get the same error as above during 
run. Btw, I'm using hypre as the preconditioner and default solver.

May I know why using these options give the error?

Thank you very much.

Regards.


From knepley at gmail.com  Tue May 27 11:13:12 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 27 May 2008 11:13:12 -0500
Subject: Using MAT_NO_NEW_NONZERO_LOCATIONS and MAT_SYMMETRIC give error
In-Reply-To: <483C2FBF.90509@gmail.com>
References: <483C2FBF.90509@gmail.com>
Message-ID: <a9f269830805270913j5c98636dtab38ab3e767f7e50@mail.gmail.com>

On Tue, May 27, 2008 at 10:58 AM, Ben Tay <zonexo at gmail.com> wrote:
> Hi,
>
> I read in the manual that I can use either call
> MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) or call
> MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr).
>
> When I use MAT_NO_NEW_NONZERO_LOCATIONS for the matrix formed by my momentum
> eqn, I get the error on my home computer udring compiling:
>
> :\Myprojects2\imb_airfoil_x2_parallel\mom_disz.f90(356) : Error: This name
> does not have a type, and must have an explicit type.
> [MAT_NO_NEW_NONZERO_LOCATIONS]
> call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)

1) It appears this symbol was not defined for Fortran. This is fixed
in petsc-dev.

> Although this error does not happen when I compile in linux, I can this
> error during run:
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC
> ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
> find memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
>
> I use KSPRICHARDSON and PCILU for my momentum eqn matrix.
>
> When I use MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr) for my poisson
> eqn which is symmetric, I get the same error as above during run. Btw, I'm
> using hypre as the preconditioner and default solver.

This option does not actually do anything. The SEGV must come from
another error.

  Matt

> May I know why using these options give the error?
>
> Thank you very much.
>
> Regards.
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From Amit.Itagi at seagate.com  Tue May 27 12:10:25 2008
From: Amit.Itagi at seagate.com (Amit.Itagi at seagate.com)
Date: Tue, 27 May 2008 13:10:25 -0400
Subject: Code structuring - Communicator
In-Reply-To: <a9f269830805270823w6fee83b2s8a44f54edbc8a78a@mail.gmail.com>
Message-ID: <OF059E74C9.5A142BA7-ON85257456.005E38ED-85257456.005E5B6B@seagate.com>


owner-petsc-users at mcs.anl.gov wrote on 05/27/2008 11:23:44 AM:

> On Tue, May 27, 2008 at 10:18 AM,  <Amit.Itagi at seagate.com> wrote:
> > Barry,
> >
> > I got a part of what I was trying to do (sub-communicator etc.),
working.
> > Now suppose I want to repeat a calculation with a different input, I
have
> > two ways of doing it (based on what I have coded).
> >
> > 1)
> >
> > MPI_Initialize
> > Create a group using MPI_Comm_group
> > Create several sub-groups and sub-communicators using MPI_Group_Incl
and
> > MPI_Comm_create
> > Assign the sub-communicator to PETSC_COMM_WORLD
> > // Calculation 1
> > {
> > Do PetscInitialize
> > Perform the calculation
> > Do PetscFinalize
> > }
> > // Calculation 2
> > {
> > Do PetscInitialize
> > Perform the calculation
> > Do PetscFinalize
> > }
> > Do MPI_finalize
> >
> > 2)
> >
> > MPI_Initialize
> > Create a group using MPI_Comm_group
> > Create several sub-groups and sub-communicators using MPI_Group_Incl
and
> > MPI_Comm_create
> > Assign the sub-communicator to PETSC_COMM_WORLD
> > Do PetscInitialize
> > // Calculation 1
> > {
> > Perform the calculation
> > }
> > // Calculation 2
> > {
> > Perform the calculation
> > }
> > Do PetscFinalize
> > Do MPI_finalize
> >
> >
> > The first method crashes. I am trying to understand why.  The
documentation
>
> What do you mean "crashes" and what line does it happen on? You can use
> -start_in_debugger to get a stack trace. I do not completely understand
your
> pseudocode, however, you should never call PetscInitialize()/Finalize()
more
> than once.

As Barry pointed out, the multiple calls to PetscInitialize is the likely
reason for my problem.
Thanks, Matt and Barry.


>
>    Matt
>
> > says that PetscFinalize calls MPI_finalize only if MPI_Initialize is
not
> > called before PetscInitialize. In my case, is PetscFinalize destroying
the
> > sub-communicators ?
> >
> > Thanks
> >
> > Rgds,
> > Amit
> >
> >
> >
> >
> >
> >             Barry Smith
> >             <bsmith at mcs.anl.g
> >             ov>
To
> >             Sent by:                  petsc-users at mcs.anl.gov
> >             owner-petsc-users
cc
> >             @mcs.anl.gov
> >             No Phone Info
Subject
> >             Available                 Re: Code structuring -
Communicator
> >
> >
> >             05/09/2008 03:07
> >             PM
> >
> >
> >             Please respond to
> >             petsc-users at mcs.a
> >                  nl.gov
> >
> >
> >
> >
> >
> >
> >
> >    There are many ways to do this, most of them involve using MPI to
> > construct subcommunicators
> > for the various sub parallel tasks. You very likely want to keep
> > PetscInitialize() at
> > the very beginning of the program; you would not write the calls in
> > terms of
> > PETSC_COMM_WORLD or MPI_COMM_WORLD, rather you would use the
> > subcommunicators to create the objects.
> >
> >    An alternative approach is to look at the manual page for
> > PetscOpenMPMerge(), PetscOpenMPRun(),
> > PetscOpenMPNew() in petsc-dev. These allow a simple master-worker
> > model of parallelism
> > with PETSc with a bunch of masters that can work together (instead of
> > just one master) and each
> > master controls a bunch of workers. The code in src/ksp/pc/impls/
> > openmp uses this code.
> >
> > Note that OpenMP has NOTHING to do with OpenMP the standard. Also I
> > don't really have
> > any support for Fortran, I hope you use C/C++. Comments welcome. It
> > sounds like this matches
> > what you need. It's pretty cool,  but underdeveloped.
> >
> >    Barry
> >
> >
> >
> > On May 9, 2008, at 12:46 PM, Amit.Itagi at seagate.com wrote:
> >
> >>
> >> Hi,
> >>
> >> I have a question about the Petsc communicator. I have a petsc program
> >> "foo" which essentially runs in parallel and gives me
> >> y=f(x1,x2,...), where
> >> y is an output parameter and xi's are input parameters. Suppose, I
> >> want to
> >> run a parallel optimizer for the input parameters. I am looking at the
> >> following functionality. I submit the optimizer job on 16 processors
> >> (using
> >> "mpiexec -np 16 progName"). The optimizer should then submit 4 runs of
> >> "foo", each running parallely on 4 processors. "foo" will be written
> >> as a
> >> function and not as a main program in this case. How can I get this
> >> functionality using Petsc ? Should PetscInitialize be called in the
> >> optimizer, or in each foo run ? If PetscInitialize is called in the
> >> optimizer, is there a way to make the foo function run only on a
> >> subset of
> >> the 16 processors ?
> >>
> >> May be, I haven't done a good job of explaining my problem. Let me
> >> know if
> >> you need any clarifications.
> >>
> >> Thanks
> >>
> >> Rgds,
> >> Amit
> >>
> >
> >
> >
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>


From bsmith at mcs.anl.gov  Tue May 27 12:31:44 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 27 May 2008 12:31:44 -0500
Subject: Using MAT_NO_NEW_NONZERO_LOCATIONS and MAT_SYMMETRIC give error
In-Reply-To: <a9f269830805270913j5c98636dtab38ab3e767f7e50@mail.gmail.com>
References: <483C2FBF.90509@gmail.com> <a9f269830805270913j5c98636dtab38ab3e767f7e50@mail.gmail.com>
Message-ID: <44E50826-91FD-40CF-BE02-EC9E0CCDDAE9@mcs.anl.gov>


    The calling sequence for this routine changed recently, make sure  
you are using the
same PETSc version on your different systems.

    Also please send bug reports like this to petsc-maint at mcs.anl.gov  
not petsc-users at mcs.anl.gov

    Barry

On May 27, 2008, at 11:13 AM, Matthew Knepley wrote:

> On Tue, May 27, 2008 at 10:58 AM, Ben Tay <zonexo at gmail.com> wrote:
>> Hi,
>>
>> I read in the manual that I can use either call
>> MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)  
>> or call
>> MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr).
>>
>> When I use MAT_NO_NEW_NONZERO_LOCATIONS for the matrix formed by my  
>> momentum
>> eqn, I get the error on my home computer udring compiling:
>>
>> :\Myprojects2\imb_airfoil_x2_parallel\mom_disz.f90(356) : Error:  
>> This name
>> does not have a type, and must have an explicit type.
>> [MAT_NO_NEW_NONZERO_LOCATIONS]
>> call  
>> MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)
>
> 1) It appears this symbol was not defined for Fortran. This is fixed
> in petsc-dev.
>
>> Although this error does not happen when I compile in linux, I can  
>> this
>> error during run:
>>
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> [0]PETSC ERROR: Try option -start_in_debugger or - 
>> on_error_attach_debugger
>> [0]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal 
>> [0]PETSC
>> ERROR: or try http://valgrind.org on linux or man libgmalloc on  
>> Apple to
>> find memory corruption errors
>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile,  
>> link, and
>> run
>> [0]PETSC ERROR: to get more information on the crash.
>> [0]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [0]PETSC ERROR: Signal received!
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>>
>> I use KSPRICHARDSON and PCILU for my momentum eqn matrix.
>>
>> When I use MatSetOption(A_mat,MAT_SYMMETRIC,PETSC_TRUE,ierr) for my  
>> poisson
>> eqn which is symmetric, I get the same error as above during run.  
>> Btw, I'm
>> using hypre as the preconditioner and default solver.
>
> This option does not actually do anything. The SEGV must come from
> another error.
>
>  Matt
>
>> May I know why using these options give the error?
>>
>> Thank you very much.
>>
>> Regards.
>>
>>
>>
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>


From balay at mcs.anl.gov  Wed May 28 08:12:39 2008
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 28 May 2008 08:12:39 -0500 (CDT)
Subject: BOUNCE petsc-users@mcs.anl.gov:    Non-member submission from [Marco
 Schauer <m.schauer@tu-bs.de>] (fwd)
Message-ID: <alpine.LFD.1.10.0805280810070.8255@asterix>


Approved:bsbglmdk

Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4])
	by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m4SAUFM21948
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:16 -0500
Received: from localhost (localhost [127.0.0.1])
	by mailgw.mcs.anl.gov (Postfix) with ESMTP id D6BA0348003
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:15 -0500 (CDT)
X-Greylist: delayed 23 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Wed, 28 May 2008 05:30:14 CDT
Received: from rzcomm22.rz.tu-bs.de (rzcomm22.rz.tu-bs.de [134.169.9.68])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mailgw.mcs.anl.gov (Postfix) with ESMTP id 5BAD2348002
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:14 -0500 (CDT)
Received: from [134.169.59.41] (seraph.infam.bau.tu-bs.de [134.169.59.41])
	by rzcomm22.rz.tu-bs.de (8.13.8/8.13.8) with ESMTP id m4SATmG0019828
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 12:29:49 +0200
	(envelope-from m.schauer at tu-bs.de)
Message-ID: <483D341B.6060100 at tu-bs.de>
Date: Wed, 28 May 2008 12:29:47 +0200
From: Marco Schauer <m.schauer at tu-bs.de>
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: petsc-users at mcs.anl.gov
Subject: How to use SNES
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov
X-Spam-Status: No, hits=0.0 required=5.0
	tests=USER_AGENT
	version=2.55
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)
X-MCS-Mail-Loop: petsc-users

Hello,
I???d like to compute a nonlinear equation system. My system looks like 
this: K(u)*u=f, in which u, f are vectors and K is a Matrix that depends 
on u. I have already a function to calculate K, but I don???t now how can 
I use PETSc to solve this system?
Thanks for support, kind regard
Marco Schauer


From balay at mcs.anl.gov  Wed May 28 08:19:10 2008
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 28 May 2008 08:19:10 -0500 (CDT)
Subject: BOUNCE petsc-users@mcs.anl.gov:    Non-member submission from [Marco
 Schauer <m.schauer@tu-bs.de>] (fwd)
Message-ID: <alpine.LFD.1.10.0805280818010.8255@asterix>

Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4])
	by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m4SAUFM21948
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:16 -0500
Received: from localhost (localhost [127.0.0.1])
	by mailgw.mcs.anl.gov (Postfix) with ESMTP id D6BA0348003
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:15 -0500 (CDT)
X-Greylist: delayed 23 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Wed, 28 May 2008 05:30:14 CDT
Received: from rzcomm22.rz.tu-bs.de (rzcomm22.rz.tu-bs.de [134.169.9.68])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mailgw.mcs.anl.gov (Postfix) with ESMTP id 5BAD2348002
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:14 -0500 (CDT)
Received: from [134.169.59.41] (seraph.infam.bau.tu-bs.de [134.169.59.41])
	by rzcomm22.rz.tu-bs.de (8.13.8/8.13.8) with ESMTP id m4SATmG0019828
	for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 12:29:49 +0200
	(envelope-from m.schauer at tu-bs.de)
Message-ID: <483D341B.6060100 at tu-bs.de>
Date: Wed, 28 May 2008 12:29:47 +0200
From: Marco Schauer <m.schauer at tu-bs.de>
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: petsc-users at mcs.anl.gov
Subject: How to use SNES
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov
X-Spam-Status: No, hits=0.0 required=5.0
	tests=USER_AGENT
	version=2.55
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)
X-MCS-Mail-Loop: petsc-users

Hello,
I???d like to compute a nonlinear equation system. My system looks like 
this: K(u)*u=f, in which u, f are vectors and K is a Matrix that depends 
on u. I have already a function to calculate K, but I don???t now how can 
I use PETSc to solve this system?
Thanks for support, kind regard
Marco Schauer


From knepley at gmail.com  Wed May 28 08:20:07 2008
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 28 May 2008 08:20:07 -0500
Subject: BOUNCE petsc-users@mcs.anl.gov: Non-member submission from [Marco Schauer <m.schauer@tu-bs.de>] (fwd)
In-Reply-To: <alpine.LFD.1.10.0805280810070.8255@asterix>
References: <alpine.LFD.1.10.0805280810070.8255@asterix>
Message-ID: <a9f269830805280620o266c2dackc6c28eda3a3350c9@mail.gmail.com>

On Wed, May 28, 2008 at 8:12 AM, Satish Balay <balay at mcs.anl.gov> wrote:
>
> Approved:bsbglmdk
>
> Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4])
>        by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m4SAUFM21948
>        for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:16 -0500
> Received: from localhost (localhost [127.0.0.1])
>        by mailgw.mcs.anl.gov (Postfix) with ESMTP id D6BA0348003
>        for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:15 -0500 (CDT)
> X-Greylist: delayed 23 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Wed, 28 May 2008 05:30:14 CDT
> Received: from rzcomm22.rz.tu-bs.de (rzcomm22.rz.tu-bs.de [134.169.9.68])
>        (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
>        (No client certificate requested)
>        by mailgw.mcs.anl.gov (Postfix) with ESMTP id 5BAD2348002
>        for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 05:30:14 -0500 (CDT)
> Received: from [134.169.59.41] (seraph.infam.bau.tu-bs.de [134.169.59.41])
>        by rzcomm22.rz.tu-bs.de (8.13.8/8.13.8) with ESMTP id m4SATmG0019828
>        for <petsc-users at mcs.anl.gov>; Wed, 28 May 2008 12:29:49 +0200
>        (envelope-from m.schauer at tu-bs.de)
> Message-ID: <483D341B.6060100 at tu-bs.de>
> Date: Wed, 28 May 2008 12:29:47 +0200
> From: Marco Schauer <m.schauer at tu-bs.de>
> User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
> MIME-Version: 1.0
> To: petsc-users at mcs.anl.gov
> Subject: How to use SNES
> Content-Type: text/plain; charset=UTF-8; format=flowed
> Content-Transfer-Encoding: 8bit
> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov
> X-Spam-Status: No, hits=0.0 required=5.0
>        tests=USER_AGENT
>        version=2.55
> X-Spam-Level:
> X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)
> X-MCS-Mail-Loop: petsc-users
>
> Hello,
> I???d like to compute a nonlinear equation system. My system looks like
> this: K(u)*u=f, in which u, f are vectors and K is a Matrix that depends
> on u. I have already a function to calculate K, but I don???t now how can
> I use PETSc to solve this system?

The first thing to do is formulate the system as F(u) = 0, so that would be

  F(u) = K(u)*u - f

This is directly soluble with the option -snes_fd. If later you want
to provide the
Jacobian, you can.

  Matt

> Thanks for support, kind regard
> Marco Schauer
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


From Lars.Rindorf at teknologisk.dk  Wed May 28 09:07:05 2008
From: Lars.Rindorf at teknologisk.dk (Lars Rindorf)
Date: Wed, 28 May 2008 16:07:05 +0200
Subject: Slow MatSetValues
Message-ID: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net>

Hi everybody
 
I have a problem with MatSetValues, since the building of my matrix
takes much longer (35 s) than its solution (0.2 s). When the number of
degrees of freedom is increased, then the problem worsens. The rate of
which the elements of the (sparse) matrix is set also seems to decrease
with the number of elements already set. That is, it becomes slower near
the end. 
 
The structure of my program is something like: 
 
for element in finite elements
    for dof in element 
        for equations in FEM formulation 
            ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values); 
            ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values); 
            ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values); 
            ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
 
            
where i,j,k,l are appropriate integers and tmp is a double value to be
added. 
 
The code has fine worked with previous version of petsc (not compiled by
me). The version of petsc that I use is slightly newer (I think), 2.3.3
vs ~2.3. 
 
Is it something of an dynamic allocation problem? I have tried using
MatSetValuesBlock, but this is only slightly faster. If I monitor the
program's CPU and memory consumption then the CPU is 100 % used and the
memory consumption is only 20-30 mb. 
 
My computer is a red hat linux with a xeon quad core processor. I use
Intel's MKL blas and lapack. 
 
What should I do to speed up the petsc?
 
Kind regards
Lars

_____________________________


Lars Rindorf
M.Sc., Ph.D.  

http://www.dti.dk <http://www.teknologisk.dk/> 

Danish Technological Institute
Gregersensvej

2630 Taastrup

Denmark
Phone +45 72 20 20 00

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080528/68524b85/attachment.htm>

From zonexo at gmail.com  Wed May 28 09:18:43 2008
From: zonexo at gmail.com (Ben Tay)
Date: Wed, 28 May 2008 22:18:43 +0800
Subject: Comparing matrices between 2 different codes and viewing of matrix
In-Reply-To: <e7ba66e40805260847j6bf4094wbe7ae651ec18e4ba@mail.gmail.com>
References: <483942C7.7080705@gmail.com> <e7ba66e40805260847j6bf4094wbe7ae651ec18e4ba@mail.gmail.com>
Message-ID: <483D69C3.7030002@gmail.com>

Hi Lisandro,

Thank you for your help. But I believe the matrix of the serial and 
parallel, if done correctly are the same during viewing. However, the 
vectors will be split into different processors.

Regards.

Lisandro Dalcin wrote:
> On 5/25/08, Ben Tay <zonexo at gmail.com> wrote:
>   
>> What is the best way to compare the matrices from the 2 different code?
>>
>>  I guess the most direct mtd is to use MatView to store the matrix in a
>> ACSII file and spot the difference between the 2 files. However, I can't
>> seem to get it right. What I did is:
>>     
>
> But then take into account that in the parallel case, the numbering of
> the rows/cols of the matrix will perhaps not match the one of the
> sequential case. If you save the parallel matrix, you will also need
> to save an appropriate permutation to be able to actually compare the
> matrices. Or perhaps better, a combination of an application ordering,
> an index set, and MatPermute().
>
>
>   


From gdiso at ustc.edu  Wed May 28 09:48:14 2008
From: gdiso at ustc.edu (Gong Ding)
Date: Wed, 28 May 2008 22:48:14 +0800
Subject: MatZeroRows and MatAssembly
Message-ID: <F52FB94ADCF641479538ACF698C19972@ustcatmel>

Hi,
I meet some trouble about my code.
I'd like to use MatZeroRows to perform  Dirichlet boundary condition.
However, it requires MatAssembly, which packs the sparse matrix.
Then the none zero pattern seems to be freezed. The continued MatSetValues
may be very low efficient if the item not in the previous none zero 
pattern..

 Is there any way to use MatZeroRows without freeze the none zero pattern?
Or I have to redesign my code....

Regards
Gong Ding 


From bsmith at mcs.anl.gov  Wed May 28 10:03:57 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 28 May 2008 10:03:57 -0500
Subject: Slow MatSetValues
In-Reply-To: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net>
References: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net>
Message-ID: <DB6D4844-5B2D-4BF5-95EA-EB77894AFCC2@mcs.anl.gov>


http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse
http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html

Also, slightl less important,  collapse the 4 MatSetValues() below  
into a single call that does the little two by two block

    Barry

On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:

> Hi everybody
>
> I have a problem with MatSetValues, since the building of my matrix  
> takes much longer (35 s) than its solution (0.2 s). When the number  
> of degrees of freedom is increased, then the problem worsens. The  
> rate of which the elements of the (sparse) matrix is set also seems  
> to decrease with the number of elements already set. That is, it  
> becomes slower near the end.
>
> The structure of my program is something like:
>
> for element in finite elements
>     for dof in element
>         for equations in FEM formulation
>             ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
>             ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
>             ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
>             ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
>
>
> where i,j,k,l are appropriate integers and tmp is a double value to  
> be added.
>
> The code has fine worked with previous version of petsc (not  
> compiled by me). The version of petsc that I use is slightly newer  
> (I think), 2.3.3 vs ~2.3.
>
> Is it something of an dynamic allocation problem? I have tried using  
> MatSetValuesBlock, but this is only slightly faster. If I monitor  
> the program's CPU and memory consumption then the CPU is 100 % used  
> and the memory consumption is only 20-30 mb.
>
> My computer is a red hat linux with a xeon quad core processor. I  
> use Intel's MKL blas and lapack.
>
> What should I do to speed up the petsc?
>
> Kind regards
> Lars
> _____________________________
>
>
> Lars Rindorf
> M.Sc., Ph.D.
>
> http://www.dti.dk
>
> Danish Technological Institute
> Gregersensvej
>
> 2630 Taastrup
>
> Denmark
> Phone +45 72 20 20 00
>
>


From bsmith at mcs.anl.gov  Wed May 28 10:06:04 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 28 May 2008 10:06:04 -0500
Subject: MatZeroRows and MatAssembly
In-Reply-To: <F52FB94ADCF641479538ACF698C19972@ustcatmel>
References: <F52FB94ADCF641479538ACF698C19972@ustcatmel>
Message-ID: <A6D3C974-5407-4525-8382-B0DBDD5100C2@mcs.anl.gov>


On May 28, 2008, at 9:48 AM, Gong Ding wrote:

> Hi,
> I meet some trouble about my code.
> I'd like to use MatZeroRows to perform  Dirichlet boundary condition.
> However, it requires MatAssembly, which packs the sparse matrix.
> Then the none zero pattern seems to be freezed.

    Not froozen, but yes adding more nonzeros later will be inefficient.

> The continued MatSetValues
> may be very low efficient if the item not in the previous none zero  
> pattern..

    Yes
>
>
> Is there any way to use MatZeroRows without freeze the none zero  
> pattern?

   No.

>
> Or I have to redesign my code....

    Generally you can apply the Dirichlet boundary conditions after  
you have
completely constructed the matrix.

    Barry

>
>
> Regards
> Gong Ding
>
>


From billy at dem.uminho.pt  Thu May 29 15:50:23 2008
From: billy at dem.uminho.pt (=?iso-8859-1?Q?Billy_Ara=FAjo?=)
Date: Thu, 29 May 2008 21:50:23 +0100
Subject: Slow MatSetValues
References: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net> <DB6D4844-5B2D-4BF5-95EA-EB77894AFCC2@mcs.anl.gov>
Message-ID: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt>


Hi,

I just want to share my experience with FE assembly.
I think the problem of preallocation in finite element matrices is that you don't know how many elements are connected to a given node, there can be 5, 20 elements or more. You can build a structure with the number of nodes connected to a node and then preallocate the matrix but this is not very efficient.

I know UMFPACK has a method of forming triplets with the matrix information and then it has routines to add duplicate entries and compress the data in a compressed matrix format. Although I have never used UMFPACK with PETSC. I also don't know if there are similiar functions in PETSC optimized for FE matrix assembly. 

Regards,

Billy.


-----Mensagem original-----
De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith
Enviada: qua 28-05-2008 16:03
Para: petsc-users at mcs.anl.gov
Assunto: Re: Slow MatSetValues
 

http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse
http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html

Also, slightl less important,  collapse the 4 MatSetValues() below  
into a single call that does the little two by two block

    Barry

On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:

> Hi everybody
>
> I have a problem with MatSetValues, since the building of my matrix  
> takes much longer (35 s) than its solution (0.2 s). When the number  
> of degrees of freedom is increased, then the problem worsens. The  
> rate of which the elements of the (sparse) matrix is set also seems  
> to decrease with the number of elements already set. That is, it  
> becomes slower near the end.
>
> The structure of my program is something like:
>
> for element in finite elements
>     for dof in element
>         for equations in FEM formulation
>             ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
>             ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
>             ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
>             ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
>
>
> where i,j,k,l are appropriate integers and tmp is a double value to  
> be added.
>
> The code has fine worked with previous version of petsc (not  
> compiled by me). The version of petsc that I use is slightly newer  
> (I think), 2.3.3 vs ~2.3.
>
> Is it something of an dynamic allocation problem? I have tried using  
> MatSetValuesBlock, but this is only slightly faster. If I monitor  
> the program's CPU and memory consumption then the CPU is 100 % used  
> and the memory consumption is only 20-30 mb.
>
> My computer is a red hat linux with a xeon quad core processor. I  
> use Intel's MKL blas and lapack.
>
> What should I do to speed up the petsc?
>
> Kind regards
> Lars
> _____________________________
>
>
> Lars Rindorf
> M.Sc., Ph.D.
>
> http://www.dti.dk
>
> Danish Technological Institute
> Gregersensvej
>
> 2630 Taastrup
>
> Denmark
> Phone +45 72 20 20 00
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080529/cf987911/attachment.htm>

From bsmith at mcs.anl.gov  Thu May 29 16:49:42 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 29 May 2008 16:49:42 -0500
Subject: Slow MatSetValues
In-Reply-To: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt>
References: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net> <DB6D4844-5B2D-4BF5-95EA-EB77894AFCC2@mcs.anl.gov> <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt>
Message-ID: <3BEF7FCC-AF4A-461A-B068-9DB01EF94B7A@mcs.anl.gov>


   Partition the elements across the processes,

    then partition the nodes across processes (try to make sure that  
each node is on the same process of at least one of its elements),

    create
       1) three parallel vectors with the number of local owned nodes  
on each process
           call these vectors off and on and owner; fill the on vector  
with a 1 in each location, fill the vector owner with rank for each  
element
       2) three sequential vectors on each process with the total  
number of nodes of all the elements of that process (this is the  
locally owned plus ghosted nodes)
            call these vectors ghostedoff and ghostedon and ghostedowner
       3) a VecScatter from the "locally owned plus ghosted nodes" to  
the "local owned nodes"
       [you need these anyways for the numerical part of the code when  
you evaluate your nonlinear functions (or right hand side for linear  
problems)

   scatter the owner vector to the ghostedowner vector
   now on each process loop over the locally owned ELEMENTS
       for each node1 in that element
            for each node2 in that element (excluding the node1 in the  
outer loop)
                  if node1 and node2 share an edge (face in 3d) and  
that edge (face in 3d) is not a boundary edge (face in 3d)  set t = .5  
(this prevents double counting of these couplings)
                  else set t = 1.0
                  if node1 and node2 are both owned by the same  
process** addt t into ghostedon at both the node1 location and the  
node2 location
                  if node1 and node2 are owned by different processes  
add t into ghostedoff at both the node1 and node2 location

    Do a VecScatter add from the ghostedoff and ghostedon into the off  
and on.

    The off and on now contain exactly the preallocation need for each  
processes preallocation.

    The amount of work required is proportional to the number of  
elements times the (number of nodes on an element)^2, the amount of  
memory
    needed is roughly three global vectors and three local vectors.  
This is much less work and memory then needed in the numerical part of  
the
    code hence is very efficient. In fact it is likely much cheaper  
than a single nonlinear function evaluation.

     Barry

** two nodes are owned by the same process if ghostedowner of node1  
matches ghostedowner of node2

On May 29, 2008, at 3:50 PM, Billy Ara?jo wrote:

>
> Hi,
>
> I just want to share my experience with FE assembly.
> I think the problem of preallocation in finite element matrices is  
> that you don't know how many elements are connected to a given node,  
> there can be 5, 20 elements or more. You can build a structure with  
> the number of nodes connected to a node and then preallocate the  
> matrix but this is not very efficient.
>
> I know UMFPACK has a method of forming triplets with the matrix  
> information and then it has routines to add duplicate entries and  
> compress the data in a compressed matrix format. Although I have  
> never used UMFPACK with PETSC. I also don't know if there are  
> similiar functions in PETSC optimized for FE matrix assembly.
>
> Regards,
>
> Billy.
>
>
>
> -----Mensagem original-----
> De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith
> Enviada: qua 28-05-2008 16:03
> Para: petsc-users at mcs.anl.gov
> Assunto: Re: Slow MatSetValues
>
>
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
>
> Also, slightl less important,  collapse the 4 MatSetValues() below
> into a single call that does the little two by two block
>
>     Barry
>
> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:
>
> > Hi everybody
> >
> > I have a problem with MatSetValues, since the building of my matrix
> > takes much longer (35 s) than its solution (0.2 s). When the number
> > of degrees of freedom is increased, then the problem worsens. The
> > rate of which the elements of the (sparse) matrix is set also seems
> > to decrease with the number of elements already set. That is, it
> > becomes slower near the end.
> >
> > The structure of my program is something like:
> >
> > for element in finite elements
> >     for dof in element
> >         for equations in FEM formulation
> >             ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
> >             ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
> >             ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
> >             ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
> >
> >
> > where i,j,k,l are appropriate integers and tmp is a double value to
> > be added.
> >
> > The code has fine worked with previous version of petsc (not
> > compiled by me). The version of petsc that I use is slightly newer
> > (I think), 2.3.3 vs ~2.3.
> >
> > Is it something of an dynamic allocation problem? I have tried using
> > MatSetValuesBlock, but this is only slightly faster. If I monitor
> > the program's CPU and memory consumption then the CPU is 100 % used
> > and the memory consumption is only 20-30 mb.
> >
> > My computer is a red hat linux with a xeon quad core processor. I
> > use Intel's MKL blas and lapack.
> >
> > What should I do to speed up the petsc?
> >
> > Kind regards
> > Lars
> > _____________________________
> >
> >
> > Lars Rindorf
> > M.Sc., Ph.D.
> >
> > http://www.dti.dk
> >
> > Danish Technological Institute
> > Gregersensvej
> >
> > 2630 Taastrup
> >
> > Denmark
> > Phone +45 72 20 20 00
> >
> >
>
>


From dalcinl at gmail.com  Thu May 29 16:44:56 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Thu, 29 May 2008 18:44:56 -0300
Subject: Slow MatSetValues
In-Reply-To: <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt>
References: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net>
	 <DB6D4844-5B2D-4BF5-95EA-EB77894AFCC2@mcs.anl.gov>
	 <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt>
Message-ID: <e7ba66e40805291444m9ff2896o6db68de24a25fecf@mail.gmail.com>

I will not buy at first that things can be done better than looping
over elements filling a std::vector< std::set<int> >, next filling a
vector with each row size, next preallocating the AIJ matrix, and
finally another loop filling matrix rows with zeros, ones, or garbage.
All this are about 10-20 lines of code using simple C++ STL containers
and a few calls to PETSc.

If anyone has a better way, and can demostrate it with actual timing
numbers and some self contained example code, the perhaps I can take
the effort of adding something like this to PETSc. But I doubt that
this effort is going to pay something ;-).


On 5/29/08, Billy Ara?jo <billy at dem.uminho.pt> wrote:
> Hi,
>
>  I just want to share my experience with FE assembly.
>  I think the problem of preallocation in finite element matrices is that you
> don't know how many elements are connected to a given node, there can be 5,
> 20 elements or more. You can build a structure with the number of nodes
> connected to a node and then preallocate the matrix but this is not very
> efficient.
>
>  I know UMFPACK has a method of forming triplets with the matrix information
> and then it has routines to add duplicate entries and compress the data in a
> compressed matrix format. Although I have never used UMFPACK with PETSC. I
> also don't know if there are similiar functions in PETSC optimized for FE
> matrix assembly.
>
>  Regards,
>
>  Billy.
>
>
>
>  -----Mensagem original-----
>  De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith
>  Enviada: qua 28-05-2008 16:03
>  Para: petsc-users at mcs.anl.gov
>  Assunto: Re: Slow MatSetValues
>
>
>
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
>
>  Also, slightl less important,  collapse the 4 MatSetValues() below
>  into a single call that does the little two by two block
>
>      Barry
>
>  On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:
>
>  > Hi everybody
>  >
>  > I have a problem with MatSetValues, since the building of my matrix
>  > takes much longer (35 s) than its solution (0.2 s). When the number
>  > of degrees of freedom is increased, then the problem worsens. The
>  > rate of which the elements of the (sparse) matrix is set also seems
>  > to decrease with the number of elements already set. That is, it
>  > becomes slower near the end.
>  >
>  > The structure of my program is something like:
>  >
>  > for element in finite elements
>  >     for dof in element
>  >         for equations in FEM formulation
>  >             ierr =
> MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
>  >             ierr =
> MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
>  >             ierr =
> MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
>  >             ierr =
> MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
>  >
>  >
>  > where i,j,k,l are appropriate integers and tmp is a double value to
>  > be added.
>  >
>  > The code has fine worked with previous version of petsc (not
>  > compiled by me). The version of petsc that I use is slightly newer
>  > (I think), 2.3.3 vs ~2.3.
>  >
>  > Is it something of an dynamic allocation problem? I have tried using
>  > MatSetValuesBlock, but this is only slightly faster. If I monitor
>  > the program's CPU and memory consumption then the CPU is 100 % used
>  > and the memory consumption is only 20-30 mb.
>  >
>  > My computer is a red hat linux with a xeon quad core processor. I
>  > use Intel's MKL blas and lapack.
>  >
>  > What should I do to speed up the petsc?
>  >
>  > Kind regards
>  > Lars
>  > _____________________________
>  >
>  >
>  > Lars Rindorf
>  > M.Sc., Ph.D.
>  >
>  > http://www.dti.dk
>  >
>  > Danish Technological Institute
>  > Gregersensvej
>  >
>  > 2630 Taastrup
>  >
>  > Denmark
>  > Phone +45 72 20 20 00
>  >
>  >
>
>
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From bsmith at mcs.anl.gov  Thu May 29 20:21:08 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 29 May 2008 20:21:08 -0500
Subject: Slow MatSetValues
In-Reply-To: <3BEF7FCC-AF4A-461A-B068-9DB01EF94B7A@mcs.anl.gov>
References: <B7798776008DFD4B886C93DA123EA89C09F202@EXCHCLUS.localdom.net> <DB6D4844-5B2D-4BF5-95EA-EB77894AFCC2@mcs.anl.gov> <1200D8BEDB3DD54DBA528E210F372BF3D94476@BEFUNCIONARIOS.uminho.pt> <3BEF7FCC-AF4A-461A-B068-9DB01EF94B7A@mcs.anl.gov>
Message-ID: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov>


   I realize I made a mistake for three dimensions below; when nodes  
share an edge in 3d they will over counted. The fix
is to have another array with one entry per edge that gives the number  
of elements that contain that edge. Then use
                    if node1 and node2 share an edge then t = 1/ 
elementsperedge[edge that connects node1 and node2]
>                 else if node1 and node2 share an face in 3d and that  
> face in 3d is not a boundary face  set t = .5 (this prevents double  
> counting of these couplings)
>                 else set t = 1.0
    This increases the complexity of the code a bit but is still very  
rapid.

    Barry


On May 29, 2008, at 4:49 PM, Barry Smith wrote:

>
>  Partition the elements across the processes,
>
>   then partition the nodes across processes (try to make sure that  
> each node is on the same process of at least one of its elements),
>
>   create
>      1) three parallel vectors with the number of local owned nodes  
> on each process
>          call these vectors off and on and owner; fill the on vector  
> with a 1 in each location, fill the vector owner with rank for each  
> element
>      2) three sequential vectors on each process with the total  
> number of nodes of all the elements of that process (this is the  
> locally owned plus ghosted nodes)
>           call these vectors ghostedoff and ghostedon and ghostedowner
>      3) a VecScatter from the "locally owned plus ghosted nodes" to  
> the "local owned nodes"
>      [you need these anyways for the numerical part of the code when  
> you evaluate your nonlinear functions (or right hand side for linear  
> problems)
>
>  scatter the owner vector to the ghostedowner vector
>  now on each process loop over the locally owned ELEMENTS
>      for each node1 in that element
>           for each node2 in that element (excluding the node1 in the  
> outer loop)
>                 if node1 and node2 share an edge (face in 3d) and  
> that edge (face in 3d) is not a boundary edge (face in 3d)  set t = . 
> 5 (this prevents double counting of these couplings)
>                 else set t = 1.0
>                 if node1 and node2 are both owned by the same  
> process** addt t into ghostedon at both the node1 location and the  
> node2 location
>                 if node1 and node2 are owned by different processes  
> add t into ghostedoff at both the node1 and node2 location
>
>   Do a VecScatter add from the ghostedoff and ghostedon into the off  
> and on.
>
>   The off and on now contain exactly the preallocation need for each  
> processes preallocation.
>
>   The amount of work required is proportional to the number of  
> elements times the (number of nodes on an element)^2, the amount of  
> memory
>   needed is roughly three global vectors and three local vectors.  
> This is much less work and memory then needed in the numerical part  
> of the
>   code hence is very efficient. In fact it is likely much cheaper  
> than a single nonlinear function evaluation.
>
>    Barry
>
> ** two nodes are owned by the same process if ghostedowner of node1  
> matches ghostedowner of node2
>
> On May 29, 2008, at 3:50 PM, Billy Ara?jo wrote:
>
>>
>> Hi,
>>
>> I just want to share my experience with FE assembly.
>> I think the problem of preallocation in finite element matrices is  
>> that you don't know how many elements are connected to a given  
>> node, there can be 5, 20 elements or more. You can build a  
>> structure with the number of nodes connected to a node and then  
>> preallocate the matrix but this is not very efficient.
>>
>> I know UMFPACK has a method of forming triplets with the matrix  
>> information and then it has routines to add duplicate entries and  
>> compress the data in a compressed matrix format. Although I have  
>> never used UMFPACK with PETSC. I also don't know if there are  
>> similiar functions in PETSC optimized for FE matrix assembly.
>>
>> Regards,
>>
>> Billy.
>>
>>
>>
>> -----Mensagem original-----
>> De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith
>> Enviada: qua 28-05-2008 16:03
>> Para: petsc-users at mcs.anl.gov
>> Assunto: Re: Slow MatSetValues
>>
>>
>> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse
>> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
>>
>> Also, slightl less important,  collapse the 4 MatSetValues() below
>> into a single call that does the little two by two block
>>
>>    Barry
>>
>> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:
>>
>> > Hi everybody
>> >
>> > I have a problem with MatSetValues, since the building of my matrix
>> > takes much longer (35 s) than its solution (0.2 s). When the number
>> > of degrees of freedom is increased, then the problem worsens. The
>> > rate of which the elements of the (sparse) matrix is set also seems
>> > to decrease with the number of elements already set. That is, it
>> > becomes slower near the end.
>> >
>> > The structure of my program is something like:
>> >
>> > for element in finite elements
>> >     for dof in element
>> >         for equations in FEM formulation
>> >             ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
>> >             ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
>> >             ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
>> >             ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
>> >
>> >
>> > where i,j,k,l are appropriate integers and tmp is a double value to
>> > be added.
>> >
>> > The code has fine worked with previous version of petsc (not
>> > compiled by me). The version of petsc that I use is slightly newer
>> > (I think), 2.3.3 vs ~2.3.
>> >
>> > Is it something of an dynamic allocation problem? I have tried  
>> using
>> > MatSetValuesBlock, but this is only slightly faster. If I monitor
>> > the program's CPU and memory consumption then the CPU is 100 % used
>> > and the memory consumption is only 20-30 mb.
>> >
>> > My computer is a red hat linux with a xeon quad core processor. I
>> > use Intel's MKL blas and lapack.
>> >
>> > What should I do to speed up the petsc?
>> >
>> > Kind regards
>> > Lars
>> > _____________________________
>> >
>> >
>> > Lars Rindorf
>> > M.Sc., Ph.D.
>> >
>> > http://www.dti.dk
>> >
>> > Danish Technological Institute
>> > Gregersensvej
>> >
>> > 2630 Taastrup
>> >
>> > Denmark
>> > Phone +45 72 20 20 00
>> >
>> >
>>
>>
>


From Lars.Rindorf at teknologisk.dk  Fri May 30 06:44:12 2008
From: Lars.Rindorf at teknologisk.dk (Lars Rindorf)
Date: Fri, 30 May 2008 13:44:12 +0200
Subject: SV: Slow MatSetValues
In-Reply-To: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov>
Message-ID: <B7798776008DFD4B886C93DA123EA89C09F20D@EXCHCLUS.localdom.net>

Hi everybody

Thanks for all the suggestions and help. The problem is of a bit different nature. I use only direct solvers, so I give the options "-ksp_type preonly -pc_type lu" to make a standard LU factorization. This works fine without any problems. If I additionally set "-mat_type umfpack" to use umfpack then MatSetValues is very, very slow (about 50 times slower). If, as a test, I call MatAssemblyBegin and MatAssemblyEnd before MatSetValues, and only use the lu (no umfpack) then the performance is very similarly slow.

My code is otherwise identical in PETsc setup to that at http://www-unix.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex8.c.html

There is no need to invoke MatAssemblyBegin() with the argument MAT_FLUSH_ASSEMBLY since MatSetValues is only given the ADD_VALUES argument. So it is not that.

Is there some conflict with the matrix format used by umfpack and something else?

KR, Lars


-----Oprindelig meddelelse-----
Fra: owner-petsc-users at mcs.anl.gov [mailto:owner-petsc-users at mcs.anl.gov] P? vegne af Barry Smith
Sendt: 30. maj 2008 03:21
Til: petsc-users at mcs.anl.gov
Emne: Re: Slow MatSetValues


   I realize I made a mistake for three dimensions below; when nodes share an edge in 3d they will over counted. The fix is to have another array with one entry per edge that gives the number of elements that contain that edge. Then use
                    if node1 and node2 share an edge then t = 1/ elementsperedge[edge that connects node1 and node2]
>                 else if node1 and node2 share an face in 3d and that 
> face in 3d is not a boundary face  set t = .5 (this prevents double 
> counting of these couplings)
>                 else set t = 1.0
    This increases the complexity of the code a bit but is still very rapid.

    Barry


On May 29, 2008, at 4:49 PM, Barry Smith wrote:

>
>  Partition the elements across the processes,
>
>   then partition the nodes across processes (try to make sure that 
> each node is on the same process of at least one of its elements),
>
>   create
>      1) three parallel vectors with the number of local owned nodes on 
> each process
>          call these vectors off and on and owner; fill the on vector 
> with a 1 in each location, fill the vector owner with rank for each 
> element
>      2) three sequential vectors on each process with the total number 
> of nodes of all the elements of that process (this is the locally 
> owned plus ghosted nodes)
>           call these vectors ghostedoff and ghostedon and ghostedowner
>      3) a VecScatter from the "locally owned plus ghosted nodes" to 
> the "local owned nodes"
>      [you need these anyways for the numerical part of the code when 
> you evaluate your nonlinear functions (or right hand side for linear
> problems)
>
>  scatter the owner vector to the ghostedowner vector  now on each 
> process loop over the locally owned ELEMENTS
>      for each node1 in that element
>           for each node2 in that element (excluding the node1 in the 
> outer loop)
>                 if node1 and node2 share an edge (face in 3d) and that 
> edge (face in 3d) is not a boundary edge (face in 3d)  set t = .
> 5 (this prevents double counting of these couplings)
>                 else set t = 1.0
>                 if node1 and node2 are both owned by the same
> process** addt t into ghostedon at both the node1 location and the
> node2 location
>                 if node1 and node2 are owned by different processes 
> add t into ghostedoff at both the node1 and node2 location
>
>   Do a VecScatter add from the ghostedoff and ghostedon into the off 
> and on.
>
>   The off and on now contain exactly the preallocation need for each 
> processes preallocation.
>
>   The amount of work required is proportional to the number of 
> elements times the (number of nodes on an element)^2, the amount of 
> memory
>   needed is roughly three global vectors and three local vectors.  
> This is much less work and memory then needed in the numerical part of 
> the
>   code hence is very efficient. In fact it is likely much cheaper than 
> a single nonlinear function evaluation.
>
>    Barry
>
> ** two nodes are owned by the same process if ghostedowner of node1 
> matches ghostedowner of node2
>
> On May 29, 2008, at 3:50 PM, Billy Ara?jo wrote:
>
>>
>> Hi,
>>
>> I just want to share my experience with FE assembly.
>> I think the problem of preallocation in finite element matrices is 
>> that you don't know how many elements are connected to a given node, 
>> there can be 5, 20 elements or more. You can build a structure with 
>> the number of nodes connected to a node and then preallocate the 
>> matrix but this is not very efficient.
>>
>> I know UMFPACK has a method of forming triplets with the matrix 
>> information and then it has routines to add duplicate entries and 
>> compress the data in a compressed matrix format. Although I have 
>> never used UMFPACK with PETSC. I also don't know if there are 
>> similiar functions in PETSC optimized for FE matrix assembly.
>>
>> Regards,
>>
>> Billy.
>>
>>
>>
>> -----Mensagem original-----
>> De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith
>> Enviada: qua 28-05-2008 16:03
>> Para: petsc-users at mcs.anl.gov
>> Assunto: Re: Slow MatSetValues
>>
>>
>> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/do
>> cs/manual.pdf#sec_matsparse 
>> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/do
>> cs/manualpages/Mat/MatCreateMPIAIJ.html
>>
>> Also, slightl less important,  collapse the 4 MatSetValues() below 
>> into a single call that does the little two by two block
>>
>>    Barry
>>
>> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:
>>
>> > Hi everybody
>> >
>> > I have a problem with MatSetValues, since the building of my matrix 
>> > takes much longer (35 s) than its solution (0.2 s). When the number 
>> > of degrees of freedom is increased, then the problem worsens. The 
>> > rate of which the elements of the (sparse) matrix is set also seems 
>> > to decrease with the number of elements already set. That is, it 
>> > becomes slower near the end.
>> >
>> > The structure of my program is something like:
>> >
>> > for element in finite elements
>> >     for dof in element
>> >         for equations in FEM formulation
>> >             ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
>> >             ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
>> >             ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
>> >             ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
>> >
>> >
>> > where i,j,k,l are appropriate integers and tmp is a double value to 
>> > be added.
>> >
>> > The code has fine worked with previous version of petsc (not 
>> > compiled by me). The version of petsc that I use is slightly newer 
>> > (I think), 2.3.3 vs ~2.3.
>> >
>> > Is it something of an dynamic allocation problem? I have tried
>> using
>> > MatSetValuesBlock, but this is only slightly faster. If I monitor 
>> > the program's CPU and memory consumption then the CPU is 100 % used 
>> > and the memory consumption is only 20-30 mb.
>> >
>> > My computer is a red hat linux with a xeon quad core processor. I 
>> > use Intel's MKL blas and lapack.
>> >
>> > What should I do to speed up the petsc?
>> >
>> > Kind regards
>> > Lars
>> > _____________________________
>> >
>> >
>> > Lars Rindorf
>> > M.Sc., Ph.D.
>> >
>> > http://www.dti.dk
>> >
>> > Danish Technological Institute
>> > Gregersensvej
>> >
>> > 2630 Taastrup
>> >
>> > Denmark
>> > Phone +45 72 20 20 00
>> >
>> >
>>
>>
>


From jed at 59A2.org  Fri May 30 07:31:09 2008
From: jed at 59A2.org (Jed Brown)
Date: Fri, 30 May 2008 14:31:09 +0200
Subject: SV: Slow MatSetValues
In-Reply-To: <B7798776008DFD4B886C93DA123EA89C09F20D@EXCHCLUS.localdom.net>
References: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov> <B7798776008DFD4B886C93DA123EA89C09F20D@EXCHCLUS.localdom.net>
Message-ID: <20080530123109.GB3835@brakk.ethz.ch>

On Fri 2008-05-30 13:44, Lars Rindorf wrote:
> Hi everybody
> 
> Thanks for all the suggestions and help. The problem is of a bit different nature. I use only direct solvers, so I give the options "-ksp_type preonly -pc_type lu" to make a standard LU factorization. This works fine without any problems. If I additionally set "-mat_type umfpack" to use umfpack then MatSetValues is very, very slow (about 50 times slower). If, as a test, I call MatAssemblyBegin and MatAssemblyEnd before MatSetValues, and only use the lu (no umfpack) then the performance is very similarly slow.

I've seen this problem when preallocation information is lost by changing the
matrix type.  Try putting MatSeqAIJSetPreallocation() and/or (it doesn't hurt to
do both) MatMPIAIJSetPreallocation() after MatSetFromOptions().  This will
preallocate for any matrix type that inherits from these two types (which I
think is anything you might use).

Jed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080530/f90072c3/attachment.pgp>

From gdiso at ustc.edu  Fri May 30 10:19:32 2008
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 30 May 2008 23:19:32 +0800
Subject: problem of parallel MatAssembled
Message-ID: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel>

Hi,
I use MatAssembled to determine if a parallel matrix (MPIAIJ) is assembled.
One processor says 1 and another says 0. (Petsc-2.3.3-p2)
The correct answer should be the same for all the processor  I think.
Is it a bug or I forget something?

BTW. I think a function as MatAddRowToRow is useful.
I had implemented with MatGetRow and MatSetValues (or use Mat*Mat?). 
However, write it from lower level should be more efficient.
If the develops can add this to petsc...

Regards
Gong Ding


From dalcinl at gmail.com  Fri May 30 10:37:00 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 30 May 2008 12:37:00 -0300
Subject: problem of parallel MatAssembled
In-Reply-To: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel>
References: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel>
Message-ID: <e7ba66e40805300837q9901d48lb4dcfc3ea9ce43db@mail.gmail.com>

On 5/30/08, Gong Ding <gdiso at ustc.edu> wrote:
> Hi,
>  I use MatAssembled to determine if a parallel matrix (MPIAIJ) is assembled.
>  One processor says 1 and another says 0. (Petsc-2.3.3-p2)
>  The correct answer should be the same for all the processor  I think.
>  Is it a bug or I forget something?

Are you completelly sure that you collectively called at ALL processes

MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY)
MatAssemblyEndA, MAT_FINAL_ASSEMBLY)

Please, note the MAT_FINAL_ASSEMBLY. If you used MAT_FLUSH_ASSEMBLY at
some process, then what you get is expected.


>  BTW. I think a function as MatAddRowToRow is useful.
>  I had implemented with MatGetRow and MatSetValues (or use Mat*Mat?).

Not sure what you are trying to do, please elaborate a bit more. Why a
call like this

PetscInt row = ...
PetscInt ncols = ...
PetscInt *cols_indices = ...
PetscScalar *cols_values = ...

MatSetValues(A, 1, &row, ncols, cols_indices, cols_values, ADD_VALUES)

is not enough for you?


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


From gdiso at ustc.edu  Fri May 30 11:36:03 2008
From: gdiso at ustc.edu (Gong Ding)
Date: Sat, 31 May 2008 00:36:03 +0800
Subject: problem of parallel MatAssembled
References: <705F7E96E4A54270AC5F62FDCC871756@ustcatmel> <e7ba66e40805300837q9901d48lb4dcfc3ea9ce43db@mail.gmail.com>
Message-ID: <AEC23F388658428F9F2C161349A7E28F@ustcatmel>


----- Original Message ----- 
From: "Lisandro Dalcin" <dalcinl at gmail.com>
To: <petsc-users at mcs.anl.gov>
Sent: Friday, May 30, 2008 11:37 PM
Subject: Re: problem of parallel MatAssembled
> Are you completelly sure that you collectively called at ALL processes
>
> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY)
> MatAssemblyEndA, MAT_FINAL_ASSEMBLY)
>
> Please, note the MAT_FINAL_ASSEMBLY. If you used MAT_FLUSH_ASSEMBLY at
> some process, then what you get is expected.

I will check it.

> Not sure what you are trying to do, please elaborate a bit more. Why a
> call like this
>
> PetscInt row = ...
> PetscInt ncols = ...
> PetscInt *cols_indices = ...
> PetscScalar *cols_values = ...
>
> MatSetValues(A, 1, &row, ncols, cols_indices, cols_values, ADD_VALUES)
>
> is not enough for you?

I am using a complex way to deal with multi-material domain.
The domain is  decomposed into several regions with different material.
Each region has its own governing equation. As a result, I first build 
equation on each region, then processing
region interface and boundary.
Many boundary types (more than 10) exising to my problem.

For a node located on the region interface, the two region both own a copy 
of the node, so I need to combine
the governing equation of the two node. For a variable with continuous value 
on the interface, the sum of 2 equation should be zero and the
value of two node should be equal. So I need to add some rows to other rows 
to the petsc vector.and give a euqation as  v1-v2=0

And the equation is nonlinear, I have to evaluate  Jacobian matrix, which is 
done by AD from the equation evaluation in each region.
So I also need to add some rows of jacobian matrix to other rows.
I wonder how others process multi-material problem?

G.D. 


From keita at cray.com  Fri May 30 16:04:10 2008
From: keita at cray.com (Keita Teranishi)
Date: Fri, 30 May 2008 16:04:10 -0500
Subject: Support for SuperLU_Dist 2.2
Message-ID: <925346A443D4E340BEB20248BAFCDBDF05BCA9B7@CFEVS1-IP.americas.cray.com>

Hi,

 
Does  PETSc support  SuperLU_DIST version 2.2?   I am interested in using new SuperLU through PETSc's ksp interface.

 
Thanks,

 
================================
 Keita Teranishi
 Math Software Group
 Cray, Inc.
 keita at cray.com
================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080530/9205dcb0/attachment.htm>

From balay at mcs.anl.gov  Fri May 30 16:16:26 2008
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 30 May 2008 16:16:26 -0500 (CDT)
Subject: Support for SuperLU_Dist 2.2
In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF05BCA9B7@CFEVS1-IP.americas.cray.com>
References: <925346A443D4E340BEB20248BAFCDBDF05BCA9B7@CFEVS1-IP.americas.cray.com>
Message-ID: <alpine.LFD.1.10.0805301615470.9749@asterix>

SuperLU_DIST 2.2 support appears to be in petsc-dev.

Satish

On Fri, 30 May 2008, Keita Teranishi wrote:

> Hi,
> 
>  
> 
> Does  PETSc support  SuperLU_DIST version 2.2?   I am interested in using new SuperLU through PETSc's ksp interface.
> 
>  
> 
> Thanks,
> 
>  
> 
> ================================
>  Keita Teranishi
>  Math Software Group
>  Cray, Inc.
>  keita at cray.com
> ================================
> 
> 


From zonexo at gmail.com  Fri May 30 17:33:03 2008
From: zonexo at gmail.com (Ben Tay)
Date: Sat, 31 May 2008 06:33:03 +0800
Subject: Solving 2 eqns at the same time in PETSc
Message-ID: <4840809F.5050705@gmail.com>

Hi,

(I sent this email a while ago but I use a different email address. Not 
sure if it got through since it's not registered in the server list. 
Sorry if it was resent.)

I obtain 2 linear eqns (u and v velocity) from the momentum eqn in my 
CFD code. Instead of solving eqn 1 in parallel, and then subsequently 
eqn 2 in parallel, I am thinking of solving the 2 eqns at the same time, 
using half the number of processors on each eqn. In other words, when 
using 4 processors, I use 2 processors for eqn 1 and 2 processors for 
eqn 2. Will that be possible?

I thought that in MPI, if an equation is divided among too many 
processors, its scaling factor will decrease. So by dividing into less 
processors and solving them simultaneously, it will give better 
performance. Is that true?

I've also successfully coded 1 eqn to be solved in parallel in PETSc. 
What changes do I have to made now?

Thank you very much.

Regards.


From bsmith at mcs.anl.gov  Fri May 30 21:48:13 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 30 May 2008 21:48:13 -0500
Subject: Solving 2 eqns at the same time in PETSc
In-Reply-To: <4840809F.5050705@gmail.com>
References: <4840809F.5050705@gmail.com>
Message-ID: <C327CBC2-86CF-4A29-8D83-F1EC63AB9799@mcs.anl.gov>


On May 30, 2008, at 5:33 PM, Ben Tay wrote:

> Hi,
>
> (I sent this email a while ago but I use a different email address.  
> Not sure if it got through since it's not registered in the server  
> list. Sorry if it was resent.)
>
> I obtain 2 linear eqns (u and v velocity) from the momentum eqn in  
> my CFD code. Instead of solving eqn 1 in parallel, and then  
> subsequently eqn 2 in parallel, I am thinking of solving the 2 eqns  
> at the same time, using half the number of processors on each eqn.  
> In other words, when using 4 processors, I use 2 processors for eqn  
> 1 and 2 processors for eqn 2. Will that be possible?

    You simply use the MPI_Group and MP_Comm commands to make MPI  
communicators for the subsets of processes and then
construct the Vec, Mat, and KSP based on those new communicators.

    Barry

>
>
> I thought that in MPI, if an equation is divided among too many  
> processors, its scaling factor will decrease. So by dividing into  
> less processors and solving them simultaneously, it will give better  
> performance. Is that true?
>
> I've also successfully coded 1 eqn to be solved in parallel in  
> PETSc. What changes do I have to made now?
>
> Thank you very much.
>
> Regards.
>
>


From bsmith at mcs.anl.gov  Fri May 30 22:09:17 2008
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 30 May 2008 22:09:17 -0500
Subject: SV: Slow MatSetValues
In-Reply-To: <20080530123109.GB3835@brakk.ethz.ch>
References: <22388508-E202-4154-B96D-433471A6D090@mcs.anl.gov> <B7798776008DFD4B886C93DA123EA89C09F20D@EXCHCLUS.localdom.net> <20080530123109.GB3835@brakk.ethz.ch>
Message-ID: <CFC7BAA9-8E65-4435-AF28-B6298CC2DC02@mcs.anl.gov>


     This is a serious flaw in our user interface, I'm fixing it now  
and our next release will vastly simplify the
handling of external direct solvers and make this problem impossible.

    Barry

On May 30, 2008, at 7:31 AM, Jed Brown wrote:

> On Fri 2008-05-30 13:44, Lars Rindorf wrote:
>> Hi everybody
>>
>> Thanks for all the suggestions and help. The problem is of a bit  
>> different nature. I use only direct solvers, so I give the options  
>> "-ksp_type preonly -pc_type lu" to make a standard LU  
>> factorization. This works fine without any problems. If I  
>> additionally set "-mat_type umfpack" to use umfpack then  
>> MatSetValues is very, very slow (about 50 times slower). If, as a  
>> test, I call MatAssemblyBegin and MatAssemblyEnd before  
>> MatSetValues, and only use the lu (no umfpack) then the performance  
>> is very similarly slow.
>
> I've seen this problem when preallocation information is lost by  
> changing the
> matrix type.  Try putting MatSeqAIJSetPreallocation() and/or (it  
> doesn't hurt to
> do both) MatMPIAIJSetPreallocation() after MatSetFromOptions().   
> This will
> preallocate for any matrix type that inherits from these two types  
> (which I
> think is anything you might use).
>
> Jed