From hung.thanh.nguyen at petrell.no  Tue Feb  1 08:01:43 2011
From: hung.thanh.nguyen at petrell.no (Hung Thanh Nguyen)
Date: Tue, 01 Feb 2011 15:01:43 +0100
Subject: [petsc-users] (no subject)
Message-ID: <E11DD12E447ECE4DB66FE86E510E1BA49198152D93@petsrv.petrell.local>

I am trying to install PETSc on Windows.  I am using Intel C compiler and Intel MKL. I will use whichever MPI library is most appropriate (MPICH?).

I have tried to follow the installation instructions: installed cygwin and try to run configure in the cygwin shell. But I get some error regarding some python code or compiler ( not sure which).

My main question is: Does anybode have a more detailed installation instruction for PETSc on windows systems? My goal is to have PETSc loaded as a library in to my visual studio project.

Alternatively: Should I post the error message I get when I try to configure and take it from there?

Any help would be much appreciated!
Hung
Research in matrix solvers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/e11712b9/attachment.htm>

From knepley at gmail.com  Tue Feb  1 08:09:51 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 1 Feb 2011 08:09:51 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <E11DD12E447ECE4DB66FE86E510E1BA49198152D93@petsrv.petrell.local>
References: <E11DD12E447ECE4DB66FE86E510E1BA49198152D93@petsrv.petrell.local>
Message-ID: <AANLkTi=QXLDJ=E-CmrT1UhFVApR7iOP11UDDBJfH=9S0@mail.gmail.com>

On Tue, Feb 1, 2011 at 8:01 AM, Hung Thanh Nguyen <
hung.thanh.nguyen at petrell.no> wrote:

>  I am trying to install PETSc on Windows.  I am using Intel C compiler and
> Intel MKL. I will use whichever MPI library is most appropriate (MPICH?).
>
>
>
> I have tried to follow the installation instructions: installed cygwin and
> try to run configure in the cygwin shell. But I get some error regarding
> some python code or compiler ( not sure which).
>
>
>
> My main question is: Does anybode have a more detailed installation
> instruction for PETSc on windows systems? My goal is to have PETSc loaded as
> a library in to my visual studio project.
>
>
>
> Alternatively: Should I post the error message I get when I try to
> configure and take it from there?
>

Yes, please send the entire error message to petsc-maint at mcs.anl.gov.

   Matt


>
>
> Any help would be much appreciated!
>
> Hung
>
> Research in matrix solvers
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/ee16c641/attachment.htm>

From mhender at us.ibm.com  Tue Feb  1 09:03:51 2011
From: mhender at us.ibm.com (Michael E Henderson)
Date: Tue, 1 Feb 2011 10:03:51 -0500
Subject: [petsc-users] options for gmres w/ ilu precondioner for sparse
Message-ID: <OF47F2FF8D.3507E1C3-ON8525782A.004D01ED-8525782A.0052C01F@us.ibm.com>

Good morning,

I'm trying to use gmres with an ilu preconditioner and having trouble 
getting the options right. I figure it's got to be something simple, so 
hope it's an easy question.

With options:

-ksp_type gmres
-pc_type ilu
-pc_factor_levels 10
-pc_factor_fill 10
-pc_factor_mat_solver_package spai

I get the message:

 unknown: [1|MatGetFactor() line 3646 in src/mat/interface/matrix.c: 
Matrix format mpiaij does not have a solver spai. Perhaps you must 
config/configure.py with --download-spai

I checked the configuration output and spai was indeed configured and 
built. I also tried spooles with a similar result.

The table 
http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html 
seems to be saying that only hypre/euclid can be used for ilu(k) w/ aij. 
Is that true?

-pc_factor_mat_solver_package hypre 
-pc_hypre_type euclid

also gives

unknown: [1MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix 
format mpiaij does not have a solver hypre. Perhaps you must 
config/configure.py with --download-hypre 

I'm using hypre as a preconditioer elsewhere, so I'm sure it's installed. 
Am I doing something obviously wrong?

Thanks,

Mike Henderson
------------------------------------------------------------------------------------------------------------------------------------
Mathematical Sciences, TJ Watson Research Center
mhender at watson.ibm.com
http://www.research.ibm.com/people/h/henderson/
http://multifario.sourceforge.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/884f4605/attachment.htm>

From knepley at gmail.com  Tue Feb  1 09:44:38 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 1 Feb 2011 09:44:38 -0600
Subject: [petsc-users] options for gmres w/ ilu precondioner for sparse
In-Reply-To: <OF47F2FF8D.3507E1C3-ON8525782A.004D01ED-8525782A.0052C01F@us.ibm.com>
References: <OF47F2FF8D.3507E1C3-ON8525782A.004D01ED-8525782A.0052C01F@us.ibm.com>
Message-ID: <AANLkTinNRu_T6sewii8CGoscO6w6=qDn6SLww=R3MaX9@mail.gmail.com>

On Tue, Feb 1, 2011 at 9:03 AM, Michael E Henderson <mhender at us.ibm.com>wrote:

> Good morning,
>
> I'm trying to use gmres with an ilu preconditioner and having trouble
> getting the options right. I figure it's got to be something simple, so hope
> it's an easy question.
>

This is the joy of using other packages. Looking at the source, I see 4
packages which can factor a parallel (MPIAIJ) matrix:

  MUMPS, SuperLU_dist, Spooles (now unsupported), Pastix

These are all usable from -pc_factor_mat_solver_package when using MPIAIJ.

1) We do not consider SPAI a matrix factorization package. You just use it
with --pc_type spai

2) We cannot see inside Hypre, and thus it is hard to get into this
framework. You use Euclid with

  -pc_type hypre -pc_hypre_type euclid

    Matt


> With options:
>
> -ksp_type gmres
> -pc_type ilu
> -pc_factor_levels 10
> -pc_factor_fill 10
> -pc_factor_mat_solver_package spai
>
> I get the message:
>
>  unknown: [1|MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix
> format mpiaij does not have a solver spai. Perhaps you must
> config/configure.py with --download-spai
>
> I checked the configuration output and spai was indeed configured and
> built. I also tried spooles with a similar result.
>
> The table
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.htmlseems to be saying that only hypre/euclid can be used for ilu(k) w/ aij. Is
> that true?
>
> -pc_factor_mat_solver_package hypre
> -pc_hypre_type euclid
>
> also gives
>
> unknown: [1MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix
> format mpiaij does not have a solver hypre. Perhaps you must
> config/configure.py with --download-hypre
>
> I'm using hypre as a preconditioer elsewhere, so I'm sure it's installed.
> Am I doing something obviously wrong?
>
> Thanks,
>
> Mike Henderson
>
> ------------------------------------------------------------------------------------------------------------------------------------
> Mathematical Sciences, TJ Watson Research Center
> mhender at watson.ibm.com
> http://www.research.ibm.com/people/h/henderson/
> http://multifario.sourceforge.net/
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/afd9dcee/attachment-0001.htm>

From mhender at us.ibm.com  Tue Feb  1 11:14:38 2011
From: mhender at us.ibm.com (Michael E Henderson)
Date: Tue, 1 Feb 2011 12:14:38 -0500
Subject: [petsc-users] options for gmres w/ ilu precondioner for sparse
In-Reply-To: <AANLkTinNRu_T6sewii8CGoscO6w6=qDn6SLww=R3MaX9@mail.gmail.com>
References: <OF47F2FF8D.3507E1C3-ON8525782A.004D01ED-8525782A.0052C01F@us.ibm.com>
	<AANLkTinNRu_T6sewii8CGoscO6w6=qDn6SLww=R3MaX9@mail.gmail.com>
Message-ID: <OFF322A4DF.F292462D-ON8525782A.005EAF05-8525782A.005EB93D@us.ibm.com>

Thanks

------------------------------------------------------------------------------------------------------------------------------------
Mathematical Sciences, TJ Watson Research Center
mhender at watson.ibm.com
http://www.research.ibm.com/people/h/henderson/
http://multifario.sourceforge.net/


From:   Matthew Knepley <knepley at gmail.com>
To:     PETSc users list <petsc-users at mcs.anl.gov>
Date:   02/01/2011 10:44 AM
Subject:        Re: [petsc-users] options for gmres w/ ilu precondioner 
for sparse
Sent by:        petsc-users-bounces at mcs.anl.gov


On Tue, Feb 1, 2011 at 9:03 AM, Michael E Henderson <mhender at us.ibm.com> 
wrote:
Good morning, 

I'm trying to use gmres with an ilu preconditioner and having trouble 
getting the options right. I figure it's got to be something simple, so 
hope it's an easy question. 

This is the joy of using other packages. Looking at the source, I see 4 
packages which can factor a parallel (MPIAIJ) matrix:

  MUMPS, SuperLU_dist, Spooles (now unsupported), Pastix

These are all usable from -pc_factor_mat_solver_package when using MPIAIJ.

1) We do not consider SPAI a matrix factorization package. You just use it 
with --pc_type spai

2) We cannot see inside Hypre, and thus it is hard to get into this 
framework. You use Euclid with

  -pc_type hypre -pc_hypre_type euclid

    Matt
 
With options: 

-ksp_type gmres 
-pc_type ilu 
-pc_factor_levels 10 
-pc_factor_fill 10 
-pc_factor_mat_solver_package spai 

I get the message: 

 unknown: [1|MatGetFactor() line 3646 in src/mat/interface/matrix.c: 
Matrix format mpiaij does not have a solver spai. Perhaps you must 
config/configure.py with --download-spai

I checked the configuration output and spai was indeed configured and 
built. I also tried spooles with a similar result. 

The table 
http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html 
seems to be saying that only hypre/euclid can be used for ilu(k) w/ aij. 
Is that true? 

-pc_factor_mat_solver_package hypre 
-pc_hypre_type euclid 

also gives 

unknown: [1MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix 
format mpiaij does not have a solver hypre. Perhaps you must 
config/configure.py with --download-hypre 

I'm using hypre as a preconditioer elsewhere, so I'm sure it's installed. 
Am I doing something obviously wrong? 

Thanks, 

Mike Henderson 
------------------------------------------------------------------------------------------------------------------------------------
Mathematical Sciences, TJ Watson Research Center
mhender at watson.ibm.com
http://www.research.ibm.com/people/h/henderson/
http://multifario.sourceforge.net/


-- 
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which their 
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/0f85e50a/attachment.htm>

From amesga1 at tigers.lsu.edu  Tue Feb  1 14:14:32 2011
From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad)
Date: Tue, 1 Feb 2011 14:14:32 -0600
Subject: [petsc-users] KspTrueResidualNorm
Message-ID: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>

Dear all,

I'm using Ksp *gmres* with *boomeramg* preconditioning and I suspect that
even though it converges in preconditioned norm it doesn't converge in true
norm, but as I understand *KSPSetNormType* gmres does not support true
residual norm? Is that correct and If it is, is there any other way to
monitor the true norm? Do I need to Introduce my own convergence test?

Best,
A. Mesgarnejad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/f9c0eca1/attachment.htm>

From hzhang at mcs.anl.gov  Tue Feb  1 14:20:28 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Tue, 1 Feb 2011 14:20:28 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
Message-ID: <AANLkTi=kqd+K_D89P3sGozSi8u_azBPfFkHa7o+S6bbn@mail.gmail.com>

You can use runtime option '-ksp_monitor_true_residual'.
Hong

On Tue, Feb 1, 2011 at 2:14 PM, Ataollah Mesgarnejad
<amesga1 at tigers.lsu.edu> wrote:
> Dear all,
>
> I'm using Ksp gmres with boomeramg preconditioning and I suspect that even
> though it converges in preconditioned norm it doesn't converge in true norm,
> but as I understand KSPSetNormType gmres does not support true residual
> norm? Is that correct and If it is, is there any other way to monitor the
> true norm? Do I need to Introduce my own convergence test?
>
> Best,
> A. Mesgarnejad
>

From bsmith at mcs.anl.gov  Tue Feb  1 14:21:49 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 1 Feb 2011 14:21:49 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
Message-ID: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>


   The simplest thing is to simply run with -ksp_monitor_true_residual and see how the convergence is going in the true residual norm also.

   You can also switch to right preconditioning with gmres and then the residual used by gmres is the true residual norm.  Use -ksp_preconditioner_side right -ksp_norm_type unpreconditioned

   Barry


On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:

> Dear all,
> 
> I'm using Ksp gmres with boomeramg preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand KSPSetNormType gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test?
> 
> Best,
> A. Mesgarnejad


From amesga1 at tigers.lsu.edu  Tue Feb  1 14:32:17 2011
From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad)
Date: Tue, 1 Feb 2011 14:32:17 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
Message-ID: <AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>

Barry,

I did as you said and now I receive these errors once I try to run the
program:

[3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
[3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
[3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
[3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
[4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
[4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
[4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
[4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
[4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
[4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
[4]PETSC ERROR: main() line 95 in PFMAT-main.cpp

Does right preconditiong work with *HYPRE*?
Best,
Ata

On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   The simplest thing is to simply run with -ksp_monitor_true_residual and
> see how the convergence is going in the true residual norm also.
>
>   You can also switch to right preconditioning with gmres and then the
> residual used by gmres is the true residual norm.  Use
> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
>
>   Barry
>
>
>
> On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
>
> > Dear all,
> >
> > I'm using Ksp gmres with boomeramg preconditioning and I suspect that
> even though it converges in preconditioned norm it doesn't converge in true
> norm, but as I understand KSPSetNormType gmres does not support true
> residual norm? Is that correct and If it is, is there any other way to
> monitor the true norm? Do I need to Introduce my own convergence test?
> >
> > Best,
> > A. Mesgarnejad
>
>


-- 
A. Mesgarnejad
PhD Student, Research Assistant
Mechanical Engineering Department
Louisiana State University
2203 Patrick F. Taylor Hall
Baton Rouge, La 70803
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/ff919ee2/attachment.htm>

From bsmith at mcs.anl.gov  Tue Feb  1 14:35:41 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 1 Feb 2011 14:35:41 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
Message-ID: <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov>


  Right preconditioner shouldn't matter to the preconditioner at all. What is the complete error message. Does it run on one process? 

   Barry

On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote:

> Barry,
> 
> I did as you said and now I receive these errors once I try to run the program:
> 
> [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> 
> Does right preconditiong work with HYPRE?
> Best,
> Ata
> 
> On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   The simplest thing is to simply run with -ksp_monitor_true_residual and see how the convergence is going in the true residual norm also.
> 
>   You can also switch to right preconditioning with gmres and then the residual used by gmres is the true residual norm.  Use -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
> 
>   Barry
> 
> 
> 
> On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
> 
> > Dear all,
> >
> > I'm using Ksp gmres with boomeramg preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand KSPSetNormType gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test?
> >
> > Best,
> > A. Mesgarnejad
> 
> 
> 
> 
> -- 
> A. Mesgarnejad
> PhD Student, Research Assistant
> Mechanical Engineering Department
> Louisiana State University
> 2203 Patrick F. Taylor Hall
> Baton Rouge, La 70803


From amesga1 at tigers.lsu.edu  Tue Feb  1 14:35:38 2011
From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad)
Date: Tue, 1 Feb 2011 14:35:38 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
Message-ID: <AANLkTi=mW3FzVqCgDtaKkbQhPM2PwOSd8vicEiLej2qP@mail.gmail.com>

Sorry for too many replies but it worked with -Ksp_monitor !!! I reported
the same kind of problem before too.

Best,
Ata

On Tue, Feb 1, 2011 at 2:32 PM, Ataollah Mesgarnejad <amesga1 at tigers.lsu.edu
> wrote:

> Barry,
>
> I did as you said and now I receive these errors once I try to run the
> program:
>
> [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
>
> Does right preconditiong work with *HYPRE*?
> Best,
> Ata
>
>
> On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>   The simplest thing is to simply run with -ksp_monitor_true_residual and
>> see how the convergence is going in the true residual norm also.
>>
>>   You can also switch to right preconditioning with gmres and then the
>> residual used by gmres is the true residual norm.  Use
>> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
>>
>>   Barry
>>
>>
>>
>> On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
>>
>> > Dear all,
>> >
>> > I'm using Ksp gmres with boomeramg preconditioning and I suspect that
>> even though it converges in preconditioned norm it doesn't converge in true
>> norm, but as I understand KSPSetNormType gmres does not support true
>> residual norm? Is that correct and If it is, is there any other way to
>> monitor the true norm? Do I need to Introduce my own convergence test?
>> >
>> > Best,
>> > A. Mesgarnejad
>>
>>
>
>
> --
> A. Mesgarnejad
> PhD Student, Research Assistant
> Mechanical Engineering Department
> Louisiana State University
> 2203 Patrick F. Taylor Hall
> Baton Rouge, La 70803
>


-- 
A. Mesgarnejad
PhD Student, Research Assistant
Mechanical Engineering Department
Louisiana State University
2203 Patrick F. Taylor Hall
Baton Rouge, La 70803
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/dee9f7d8/attachment.htm>

From amesga1 at tigers.lsu.edu  Tue Feb  1 14:35:38 2011
From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad)
Date: Tue, 1 Feb 2011 14:35:38 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
Message-ID: <AANLkTi=mW3FzVqCgDtaKkbQhPM2PwOSd8vicEiLej2qP@mail.gmail.com>

Sorry for too many replies but it worked with -Ksp_monitor !!! I reported
the same kind of problem before too.

Best,
Ata

On Tue, Feb 1, 2011 at 2:32 PM, Ataollah Mesgarnejad <amesga1 at tigers.lsu.edu
> wrote:

> Barry,
>
> I did as you said and now I receive these errors once I try to run the
> program:
>
> [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
>
> Does right preconditiong work with *HYPRE*?
> Best,
> Ata
>
>
> On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>   The simplest thing is to simply run with -ksp_monitor_true_residual and
>> see how the convergence is going in the true residual norm also.
>>
>>   You can also switch to right preconditioning with gmres and then the
>> residual used by gmres is the true residual norm.  Use
>> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
>>
>>   Barry
>>
>>
>>
>> On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
>>
>> > Dear all,
>> >
>> > I'm using Ksp gmres with boomeramg preconditioning and I suspect that
>> even though it converges in preconditioned norm it doesn't converge in true
>> norm, but as I understand KSPSetNormType gmres does not support true
>> residual norm? Is that correct and If it is, is there any other way to
>> monitor the true norm? Do I need to Introduce my own convergence test?
>> >
>> > Best,
>> > A. Mesgarnejad
>>
>>
>
>
> --
> A. Mesgarnejad
> PhD Student, Research Assistant
> Mechanical Engineering Department
> Louisiana State University
> 2203 Patrick F. Taylor Hall
> Baton Rouge, La 70803
>


-- 
A. Mesgarnejad
PhD Student, Research Assistant
Mechanical Engineering Department
Louisiana State University
2203 Patrick F. Taylor Hall
Baton Rouge, La 70803
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/dee9f7d8/attachment-0003.htm>

From amesga1 at tigers.lsu.edu  Tue Feb  1 15:27:03 2011
From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad)
Date: Tue, 1 Feb 2011 15:27:03 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
	<4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov>
Message-ID: <AANLkTikT0sr1f0OKZ1VsUWQH2cuu38LcLkJ+dN0WuPD7@mail.gmail.com>

I compile it with openmpi and it runs on 5 cores. And thats all the error
message from PETSC. the compelete output looks like this:

[3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
[3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
[3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
[3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
[4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
[4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
[4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
[4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
[4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
[4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
[4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 3845 on
node me-1203svr3.lsu.edu exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

Best,
Ata

On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>  Right preconditioner shouldn't matter to the preconditioner at all. What
> is the complete error message. Does it run on one process?
>
>   Barry
>
> On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote:
>
> > Barry,
> >
> > I did as you said and now I receive these errors once I try to run the
> program:
> >
> > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
> src/dm/da/utils/mhyp.c
> > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in
> src/ksp/pc/impls/hypre/hypre.c
> > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
> src/dm/da/utils/mhyp.c
> > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in
> src/ksp/pc/impls/hypre/hypre.c
> > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> >
> > Does right preconditiong work with HYPRE?
> > Best,
> > Ata
> >
> > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   The simplest thing is to simply run with -ksp_monitor_true_residual and
> see how the convergence is going in the true residual norm also.
> >
> >   You can also switch to right preconditioning with gmres and then the
> residual used by gmres is the true residual norm.  Use
> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
> >
> >   Barry
> >
> >
> >
> > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
> >
> > > Dear all,
> > >
> > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that
> even though it converges in preconditioned norm it doesn't converge in true
> norm, but as I understand KSPSetNormType gmres does not support true
> residual norm? Is that correct and If it is, is there any other way to
> monitor the true norm? Do I need to Introduce my own convergence test?
> > >
> > > Best,
> > > A. Mesgarnejad
> >
> >
> >
> >
> > --
> > A. Mesgarnejad
> > PhD Student, Research Assistant
> > Mechanical Engineering Department
> > Louisiana State University
> > 2203 Patrick F. Taylor Hall
> > Baton Rouge, La 70803
>
>


-- 
A. Mesgarnejad
PhD Student, Research Assistant
Mechanical Engineering Department
Louisiana State University
2203 Patrick F. Taylor Hall
Baton Rouge, La 70803
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/19495f57/attachment.htm>

From knepley at gmail.com  Tue Feb  1 15:32:22 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 1 Feb 2011 15:32:22 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTikT0sr1f0OKZ1VsUWQH2cuu38LcLkJ+dN0WuPD7@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
	<4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov>
	<AANLkTikT0sr1f0OKZ1VsUWQH2cuu38LcLkJ+dN0WuPD7@mail.gmail.com>
Message-ID: <AANLkTinwW4ZoGWN=DLGwD_7USLYqXbkqCgJKH6CNOKbs@mail.gmail.com>

On Tue, Feb 1, 2011 at 3:27 PM, Ataollah Mesgarnejad <amesga1 at tigers.lsu.edu
> wrote:

> I compile it with openmpi and it runs on 5 cores. And thats all the error
> message from PETSC. the compelete output looks like this:
>

Something is wrong with your output gathering. It would indicate the error,
or a signal received.

   Matt


> [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 3 with PID 3845 on
> node me-1203svr3.lsu.edu exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> Best,
> Ata
>
> On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>  Right preconditioner shouldn't matter to the preconditioner at all. What
>> is the complete error message. Does it run on one process?
>>
>>   Barry
>>
>> On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote:
>>
>> > Barry,
>> >
>> > I did as you said and now I receive these errors once I try to run the
>> program:
>> >
>> > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
>> src/dm/da/utils/mhyp.c
>> > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in
>> src/ksp/pc/impls/hypre/hypre.c
>> > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
>> > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
>> > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
>> > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
>> > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
>> src/dm/da/utils/mhyp.c
>> > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in
>> src/ksp/pc/impls/hypre/hypre.c
>> > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
>> > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
>> > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
>> > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
>> > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
>> > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
>> >
>> > Does right preconditiong work with HYPRE?
>> > Best,
>> > Ata
>> >
>> > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> >   The simplest thing is to simply run with -ksp_monitor_true_residual
>> and see how the convergence is going in the true residual norm also.
>> >
>> >   You can also switch to right preconditioning with gmres and then the
>> residual used by gmres is the true residual norm.  Use
>> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
>> >
>> >   Barry
>> >
>> >
>> >
>> > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
>> >
>> > > Dear all,
>> > >
>> > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that
>> even though it converges in preconditioned norm it doesn't converge in true
>> norm, but as I understand KSPSetNormType gmres does not support true
>> residual norm? Is that correct and If it is, is there any other way to
>> monitor the true norm? Do I need to Introduce my own convergence test?
>> > >
>> > > Best,
>> > > A. Mesgarnejad
>> >
>> >
>> >
>> >
>> > --
>> > A. Mesgarnejad
>> > PhD Student, Research Assistant
>> > Mechanical Engineering Department
>> > Louisiana State University
>> > 2203 Patrick F. Taylor Hall
>> > Baton Rouge, La 70803
>>
>>
>
>
> --
> A. Mesgarnejad
> PhD Student, Research Assistant
> Mechanical Engineering Department
> Louisiana State University
> 2203 Patrick F. Taylor Hall
> Baton Rouge, La 70803
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/78f5ebcf/attachment.htm>

From bsmith at mcs.anl.gov  Tue Feb  1 15:34:45 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 1 Feb 2011 15:34:45 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <AANLkTinwW4ZoGWN=DLGwD_7USLYqXbkqCgJKH6CNOKbs@mail.gmail.com>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
	<4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov>
	<AANLkTikT0sr1f0OKZ1VsUWQH2cuu38LcLkJ+dN0WuPD7@mail.gmail.com>
	<AANLkTinwW4ZoGWN=DLGwD_7USLYqXbkqCgJKH6CNOKbs@mail.gmail.com>
Message-ID: <98C79BFE-D40E-4502-8A0B-336DD60098E7@mcs.anl.gov>


  Suggest running with valgrind http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to see if there is some memory corruption. If that doesn't help suggest a build with MPICH to see if the same or a different problem happens. 

   Getting a partial error message like this is not normal and rarely seen.

    Barry

On Feb 1, 2011, at 3:32 PM, Matthew Knepley wrote:

> On Tue, Feb 1, 2011 at 3:27 PM, Ataollah Mesgarnejad <amesga1 at tigers.lsu.edu> wrote:
> I compile it with openmpi and it runs on 5 cores. And thats all the error message from PETSC. the compelete output looks like this:
> 
> Something is wrong with your output gathering. It would indicate the error, or a signal received.
> 
>    Matt
>  
> [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD 
> with errorcode 1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 3 with PID 3845 on
> node me-1203svr3.lsu.edu exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> 
> Best,
> Ata
> 
> On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>  Right preconditioner shouldn't matter to the preconditioner at all. What is the complete error message. Does it run on one process?
> 
>   Barry
> 
> On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote:
> 
> > Barry,
> >
> > I did as you said and now I receive these errors once I try to run the program:
> >
> > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
> > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
> > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> >
> > Does right preconditiong work with HYPRE?
> > Best,
> > Ata
> >
> > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   The simplest thing is to simply run with -ksp_monitor_true_residual and see how the convergence is going in the true residual norm also.
> >
> >   You can also switch to right preconditioning with gmres and then the residual used by gmres is the true residual norm.  Use -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
> >
> >   Barry
> >
> >
> >
> > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
> >
> > > Dear all,
> > >
> > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand KSPSetNormType gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test?
> > >
> > > Best,
> > > A. Mesgarnejad
> >
> >
> >
> >
> > --
> > A. Mesgarnejad
> > PhD Student, Research Assistant
> > Mechanical Engineering Department
> > Louisiana State University
> > 2203 Patrick F. Taylor Hall
> > Baton Rouge, La 70803
> 
> 
> 
> 
> -- 
> A. Mesgarnejad
> PhD Student, Research Assistant
> Mechanical Engineering Department
> Louisiana State University
> 2203 Patrick F. Taylor Hall
> Baton Rouge, La 70803
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener


From amesga1 at tigers.lsu.edu  Tue Feb  1 15:41:17 2011
From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad)
Date: Tue, 1 Feb 2011 15:41:17 -0600
Subject: [petsc-users] KspTrueResidualNorm
In-Reply-To: <98C79BFE-D40E-4502-8A0B-336DD60098E7@mcs.anl.gov>
References: <AANLkTinH81w=B+G9xezJU5uTz9ka2Js+GfHrcZPvi_+d@mail.gmail.com>
	<505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov>
	<AANLkTi=n77vcd8MgXMOKn5Jce41ghc=erK4hQg-rYcRD@mail.gmail.com>
	<4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov>
	<AANLkTikT0sr1f0OKZ1VsUWQH2cuu38LcLkJ+dN0WuPD7@mail.gmail.com>
	<AANLkTinwW4ZoGWN=DLGwD_7USLYqXbkqCgJKH6CNOKbs@mail.gmail.com>
	<98C79BFE-D40E-4502-8A0B-336DD60098E7@mcs.anl.gov>
Message-ID: <AANLkTinzgT+Ry9g43yWLAd0EAmoizw3rJvHMfzKXg7LM@mail.gmail.com>

I already used Valgrind to check for memory and as far I can tell the
program was working fine. At this stage my concern is mainly about the
convergence of ksp solver. I can later try compiling with petsc-dev.

Thank you for your time,
Ata

On Tue, Feb 1, 2011 at 3:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>  Suggest running with valgrind
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to
> see if there is some memory corruption. If that doesn't help suggest a build
> with MPICH to see if the same or a different problem happens.
>
>   Getting a partial error message like this is not normal and rarely seen.
>
>    Barry
>
> On Feb 1, 2011, at 3:32 PM, Matthew Knepley wrote:
>
> > On Tue, Feb 1, 2011 at 3:27 PM, Ataollah Mesgarnejad <
> amesga1 at tigers.lsu.edu> wrote:
> > I compile it with openmpi and it runs on 5 cores. And thats all the error
> message from PETSC. the compelete output looks like this:
> >
> > Something is wrong with your output gathering. It would indicate the
> error, or a signal received.
> >
> >    Matt
> >
> > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
> src/dm/da/utils/mhyp.c
> > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in
> src/ksp/pc/impls/hypre/hypre.c
> > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
> src/dm/da/utils/mhyp.c
> > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in
> src/ksp/pc/impls/hypre/hypre.c
> > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> >
> --------------------------------------------------------------------------
> > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > mpirun has exited due to process rank 3 with PID 3845 on
> > node me-1203svr3.lsu.edu exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> > Best,
> > Ata
> >
> > On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >  Right preconditioner shouldn't matter to the preconditioner at all. What
> is the complete error message. Does it run on one process?
> >
> >   Barry
> >
> > On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote:
> >
> > > Barry,
> > >
> > > I did as you said and now I receive these errors once I try to run the
> program:
> > >
> > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
> src/dm/da/utils/mhyp.c
> > > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in
> src/ksp/pc/impls/hypre/hypre.c
> > > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in
> src/dm/da/utils/mhyp.c
> > > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in
> src/ksp/pc/impls/hypre/hypre.c
> > > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
> > > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
> > > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> > > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
> > > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp
> > > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp
> > >
> > > Does right preconditiong work with HYPRE?
> > > Best,
> > > Ata
> > >
> > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >   The simplest thing is to simply run with -ksp_monitor_true_residual
> and see how the convergence is going in the true residual norm also.
> > >
> > >   You can also switch to right preconditioning with gmres and then the
> residual used by gmres is the true residual norm.  Use
> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned
> > >
> > >   Barry
> > >
> > >
> > >
> > > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote:
> > >
> > > > Dear all,
> > > >
> > > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that
> even though it converges in preconditioned norm it doesn't converge in true
> norm, but as I understand KSPSetNormType gmres does not support true
> residual norm? Is that correct and If it is, is there any other way to
> monitor the true norm? Do I need to Introduce my own convergence test?
> > > >
> > > > Best,
> > > > A. Mesgarnejad
> > >
> > >
> > >
> > >
> > > --
> > > A. Mesgarnejad
> > > PhD Student, Research Assistant
> > > Mechanical Engineering Department
> > > Louisiana State University
> > > 2203 Patrick F. Taylor Hall
> > > Baton Rouge, La 70803
> >
> >
> >
> >
> > --
> > A. Mesgarnejad
> > PhD Student, Research Assistant
> > Mechanical Engineering Department
> > Louisiana State University
> > 2203 Patrick F. Taylor Hall
> > Baton Rouge, La 70803
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
>
>


-- 
A. Mesgarnejad
PhD Student, Research Assistant
Mechanical Engineering Department
Louisiana State University
2203 Patrick F. Taylor Hall
Baton Rouge, La 70803
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110201/b763d7d3/attachment-0001.htm>

From vijay.m at gmail.com  Wed Feb  2 16:46:29 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 2 Feb 2011 16:46:29 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
Message-ID: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>

Hi,

I am trying to configure my petsc install with an MPI installation to
make use of a dual quad-core desktop system running Ubuntu. But
eventhough the configure/make process went through without problems,
the scalability of the programs don't seem to reflect what I expected.
My configure options are

--download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
--with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
--download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
--download-blacs=1 --download-scalapack=1 --with-clanguage=C++
--download-plapack=1 --download-mumps=1 --download-umfpack=yes
--with-debugging=1 --with-errorchecking=yes

Is there something else that needs to be done as part of the configure
process to enable a decent scaling ? I am only comparing programs with
mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
same time as noted from -log_summary. If it helps, I've been testing
with snes/examples/tutorials/ex20.c for all purposes with a custom
-grid parameter from command-line to control the number of unknowns.

If there is something you've witnessed before in this configuration or
if you need anything else to analyze the problem, do let me know.

Thanks,
Vijay

From knepley at gmail.com  Wed Feb  2 16:53:32 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 2 Feb 2011 16:53:32 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
Message-ID: <AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>

On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>wrote:

> Hi,
>
> I am trying to configure my petsc install with an MPI installation to
> make use of a dual quad-core desktop system running Ubuntu. But
> eventhough the configure/make process went through without problems,
> the scalability of the programs don't seem to reflect what I expected.
> My configure options are
>
> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
> --with-debugging=1 --with-errorchecking=yes
>

1) For performance studies, make a build using --with-debugging=0

2) Look at -log_summary for a breakdown of performance

   Matt


> Is there something else that needs to be done as part of the configure
> process to enable a decent scaling ? I am only comparing programs with
> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
> same time as noted from -log_summary. If it helps, I've been testing
> with snes/examples/tutorials/ex20.c for all purposes with a custom
> -grid parameter from command-line to control the number of unknowns.
>
> If there is something you've witnessed before in this configuration or
> if you need anything else to analyze the problem, do let me know.
>
> Thanks,
> Vijay
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/e76ab160/attachment.htm>

From vijay.m at gmail.com  Wed Feb  2 17:04:31 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 2 Feb 2011 17:04:31 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
Message-ID: <AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>

Matt,

The -with-debugging=1 option is certainly not meant for performance
studies but I didn't expect it to yield the same cpu time as a single
processor for snes/ex20 i.e., my runs with 1 and 2 processors take
approximately the same amount of time for computation of solution. But
I am currently configuring without debugging symbols and shall let you
know what that yields.

On a similar note, is there something extra that needs to be done to
make use of multi-core machines while using MPI ? I am not sure if
this is even related to PETSc but could be an MPI configuration option
that maybe either I or the configure process is missing. All ideas are
much appreciated.

Vijay

On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am trying to configure my petsc install with an MPI installation to
>> make use of a dual quad-core desktop system running Ubuntu. But
>> eventhough the configure/make process went through without problems,
>> the scalability of the programs don't seem to reflect what I expected.
>> My configure options are
>>
>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>> --with-debugging=1 --with-errorchecking=yes
>
> 1) For performance studies, make a build using --with-debugging=0
> 2) Look at -log_summary for a breakdown of performance
> ?? Matt
>
>>
>> Is there something else that needs to be done as part of the configure
>> process to enable a decent scaling ? I am only comparing programs with
>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>> same time as noted from -log_summary. If it helps, I've been testing
>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>> -grid parameter from command-line to control the number of unknowns.
>>
>> If there is something you've witnessed before in this configuration or
>> if you need anything else to analyze the problem, do let me know.
>>
>> Thanks,
>> Vijay
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>

From knepley at gmail.com  Wed Feb  2 17:15:07 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 2 Feb 2011 17:15:07 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
Message-ID: <AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>

On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>wrote:

> Matt,
>
> The -with-debugging=1 option is certainly not meant for performance
> studies but I didn't expect it to yield the same cpu time as a single
> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
> approximately the same amount of time for computation of solution. But
> I am currently configuring without debugging symbols and shall let you
> know what that yields.
>
> On a similar note, is there something extra that needs to be done to
> make use of multi-core machines while using MPI ? I am not sure if
> this is even related to PETSc but could be an MPI configuration option
> that maybe either I or the configure process is missing. All ideas are
> much appreciated.


Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
cheap multicore machines, there is a single memory bus, and thus using more
cores gains you very little extra performance. I still suspect you are not
actually
running in parallel, because you usually see a small speedup. That is why I
suggested looking at -log_summary since it tells you how many processes were
run and breaks down the time.

   Matt


>
> Vijay
>
> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
> > On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I am trying to configure my petsc install with an MPI installation to
> >> make use of a dual quad-core desktop system running Ubuntu. But
> >> eventhough the configure/make process went through without problems,
> >> the scalability of the programs don't seem to reflect what I expected.
> >> My configure options are
> >>
> >> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
> >> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
> >> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
> >> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
> >> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
> >> --with-debugging=1 --with-errorchecking=yes
> >
> > 1) For performance studies, make a build using --with-debugging=0
> > 2) Look at -log_summary for a breakdown of performance
> >    Matt
> >
> >>
> >> Is there something else that needs to be done as part of the configure
> >> process to enable a decent scaling ? I am only comparing programs with
> >> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
> >> same time as noted from -log_summary. If it helps, I've been testing
> >> with snes/examples/tutorials/ex20.c for all purposes with a custom
> >> -grid parameter from command-line to control the number of unknowns.
> >>
> >> If there is something you've witnessed before in this configuration or
> >> if you need anything else to analyze the problem, do let me know.
> >>
> >> Thanks,
> >> Vijay
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments
> > is infinitely more interesting than any results to which their
> experiments
> > lead.
> > -- Norbert Wiener
> >
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/fd92968a/attachment.htm>

From vijay.m at gmail.com  Wed Feb  2 17:38:00 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 2 Feb 2011 17:38:00 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
Message-ID: <AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>

Here's the performance statistic on 1 and 2 processor runs.

/usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary

                         Max       Max/Min        Avg      Total
Time (sec):           8.452e+00      1.00000   8.452e+00
Objects:              1.470e+02      1.00000   1.470e+02
Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       4.440e+02      1.00000

/usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary

                         Max       Max/Min        Avg      Total
Time (sec):           7.851e+00      1.00000   7.851e+00
Objects:              2.000e+02      1.00000   2.000e+02
Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
MPI Reductions:       1.046e+03      1.00000

I am not entirely sure if I can make sense out of that statistic but
if there is something more you need, please feel free to let me know.

Vijay

On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
> wrote:
>>
>> Matt,
>>
>> The -with-debugging=1 option is certainly not meant for performance
>> studies but I didn't expect it to yield the same cpu time as a single
>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>> approximately the same amount of time for computation of solution. But
>> I am currently configuring without debugging symbols and shall let you
>> know what that yields.
>>
>> On a similar note, is there something extra that needs to be done to
>> make use of multi-core machines while using MPI ? I am not sure if
>> this is even related to PETSc but could be an MPI configuration option
>> that maybe either I or the configure process is missing. All ideas are
>> much appreciated.
>
> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
> cheap multicore machines, there is a single memory bus, and thus using more
> cores gains you very little extra performance. I still suspect you are not
> actually
> running in parallel, because you usually see a small speedup. That is why I
> suggested looking at -log_summary since it tells you how many processes were
> run and breaks down the time.
> ?? Matt
>
>>
>> Vijay
>>
>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>> > On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am trying to configure my petsc install with an MPI installation to
>> >> make use of a dual quad-core desktop system running Ubuntu. But
>> >> eventhough the configure/make process went through without problems,
>> >> the scalability of the programs don't seem to reflect what I expected.
>> >> My configure options are
>> >>
>> >> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>> >> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>> >> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>> >> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>> >> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>> >> --with-debugging=1 --with-errorchecking=yes
>> >
>> > 1) For performance studies, make a build using --with-debugging=0
>> > 2) Look at -log_summary for a breakdown of performance
>> > ?? Matt
>> >
>> >>
>> >> Is there something else that needs to be done as part of the configure
>> >> process to enable a decent scaling ? I am only comparing programs with
>> >> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>> >> same time as noted from -log_summary. If it helps, I've been testing
>> >> with snes/examples/tutorials/ex20.c for all purposes with a custom
>> >> -grid parameter from command-line to control the number of unknowns.
>> >>
>> >> If there is something you've witnessed before in this configuration or
>> >> if you need anything else to analyze the problem, do let me know.
>> >>
>> >> Thanks,
>> >> Vijay
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments
>> > is infinitely more interesting than any results to which their
>> > experiments
>> > lead.
>> > -- Norbert Wiener
>> >
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>

From bsmith at mcs.anl.gov  Wed Feb  2 18:06:29 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 2 Feb 2011 18:06:29 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
Message-ID: <B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>


  We need all the information from -log_summary to see what is going on.

  Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.

   Barry

On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:

> Here's the performance statistic on 1 and 2 processor runs.
> 
> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
> 
>                         Max       Max/Min        Avg      Total
> Time (sec):           8.452e+00      1.00000   8.452e+00
> Objects:              1.470e+02      1.00000   1.470e+02
> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       4.440e+02      1.00000
> 
> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
> 
>                         Max       Max/Min        Avg      Total
> Time (sec):           7.851e+00      1.00000   7.851e+00
> Objects:              2.000e+02      1.00000   2.000e+02
> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
> MPI Reductions:       1.046e+03      1.00000
> 
> I am not entirely sure if I can make sense out of that statistic but
> if there is something more you need, please feel free to let me know.
> 
> Vijay
> 
> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>> wrote:
>>> 
>>> Matt,
>>> 
>>> The -with-debugging=1 option is certainly not meant for performance
>>> studies but I didn't expect it to yield the same cpu time as a single
>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>> approximately the same amount of time for computation of solution. But
>>> I am currently configuring without debugging symbols and shall let you
>>> know what that yields.
>>> 
>>> On a similar note, is there something extra that needs to be done to
>>> make use of multi-core machines while using MPI ? I am not sure if
>>> this is even related to PETSc but could be an MPI configuration option
>>> that maybe either I or the configure process is missing. All ideas are
>>> much appreciated.
>> 
>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>> cheap multicore machines, there is a single memory bus, and thus using more
>> cores gains you very little extra performance. I still suspect you are not
>> actually
>> running in parallel, because you usually see a small speedup. That is why I
>> suggested looking at -log_summary since it tells you how many processes were
>> run and breaks down the time.
>>    Matt
>> 
>>> 
>>> Vijay
>>> 
>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>> eventhough the configure/make process went through without problems,
>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>> My configure options are
>>>>> 
>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>> --with-debugging=1 --with-errorchecking=yes
>>>> 
>>>> 1) For performance studies, make a build using --with-debugging=0
>>>> 2) Look at -log_summary for a breakdown of performance
>>>>    Matt
>>>> 
>>>>> 
>>>>> Is there something else that needs to be done as part of the configure
>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>> 
>>>>> If there is something you've witnessed before in this configuration or
>>>>> if you need anything else to analyze the problem, do let me know.
>>>>> 
>>>>> Thanks,
>>>>> Vijay
>>>> 
>>>> 
>>>> 
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments
>>>> is infinitely more interesting than any results to which their
>>>> experiments
>>>> lead.
>>>> -- Norbert Wiener
>>>> 
>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments
>> is infinitely more interesting than any results to which their experiments
>> lead.
>> -- Norbert Wiener
>> 


From vijay.m at gmail.com  Wed Feb  2 18:17:45 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 2 Feb 2011 18:17:45 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
Message-ID: <AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>

Barry,

Please find attached the patch for the minor change to control the
number of elements from command line for snes/ex20.c. I know that this
can be achieved with -grid_x etc from command_line but thought this
just made the typing for the refinement process a little easier. I
apologize if there was any confusion.

Also, find attached the full log summaries for -np=1 and -np=2. Thanks.

Vijay

On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?We need all the information from -log_summary to see what is going on.
>
> ?Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
>
> ? Barry
>
> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>
>> Here's the performance statistic on 1 and 2 processor runs.
>>
>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>>
>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>>
>> I am not entirely sure if I can make sense out of that statistic but
>> if there is something more you need, please feel free to let me know.
>>
>> Vijay
>>
>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>> wrote:
>>>>
>>>> Matt,
>>>>
>>>> The -with-debugging=1 option is certainly not meant for performance
>>>> studies but I didn't expect it to yield the same cpu time as a single
>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>>> approximately the same amount of time for computation of solution. But
>>>> I am currently configuring without debugging symbols and shall let you
>>>> know what that yields.
>>>>
>>>> On a similar note, is there something extra that needs to be done to
>>>> make use of multi-core machines while using MPI ? I am not sure if
>>>> this is even related to PETSc but could be an MPI configuration option
>>>> that maybe either I or the configure process is missing. All ideas are
>>>> much appreciated.
>>>
>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>>> cheap multicore machines, there is a single memory bus, and thus using more
>>> cores gains you very little extra performance. I still suspect you are not
>>> actually
>>> running in parallel, because you usually see a small speedup. That is why I
>>> suggested looking at -log_summary since it tells you how many processes were
>>> run and breaks down the time.
>>> ? ?Matt
>>>
>>>>
>>>> Vijay
>>>>
>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>> eventhough the configure/make process went through without problems,
>>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>>> My configure options are
>>>>>>
>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>
>>>>> 1) For performance studies, make a build using --with-debugging=0
>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>> ? ?Matt
>>>>>
>>>>>>
>>>>>> Is there something else that needs to be done as part of the configure
>>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>>>
>>>>>> If there is something you've witnessed before in this configuration or
>>>>>> if you need anything else to analyze the problem, do let me know.
>>>>>>
>>>>>> Thanks,
>>>>>> Vijay
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments
>>>>> is infinitely more interesting than any results to which their
>>>>> experiments
>>>>> lead.
>>>>> -- Norbert Wiener
>>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their experiments
>>> is infinitely more interesting than any results to which their experiments
>>> lead.
>>> -- Norbert Wiener
>>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex20.patch
Type: text/x-patch
Size: 526 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/b1b8c55d/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex20_np1.out
Type: application/octet-stream
Size: 11823 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/b1b8c55d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex20_np2.out
Type: application/octet-stream
Size: 12814 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/b1b8c55d/attachment-0003.obj>

From bsmith at mcs.anl.gov  Wed Feb  2 18:35:09 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 2 Feb 2011 18:35:09 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
Message-ID: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>


   Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case)

MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0

which is 74 percent of the total solve time (and 84 percent of the flops).   When 3/4th of the entire run is not parallel at all you cannot expect much speedup.  If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5.

  Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time. 

  You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed.

  Barry


On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:

> Barry,
> 
> Please find attached the patch for the minor change to control the
> number of elements from command line for snes/ex20.c. I know that this
> can be achieved with -grid_x etc from command_line but thought this
> just made the typing for the refinement process a little easier. I
> apologize if there was any confusion.
> 
> Also, find attached the full log summaries for -np=1 and -np=2. Thanks.
> 
> Vijay
> 
> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  We need all the information from -log_summary to see what is going on.
>> 
>>  Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
>> 
>>   Barry
>> 
>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>> 
>>> Here's the performance statistic on 1 and 2 processor runs.
>>> 
>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
>>> 
>>>                         Max       Max/Min        Avg      Total
>>> Time (sec):           8.452e+00      1.00000   8.452e+00
>>> Objects:              1.470e+02      1.00000   1.470e+02
>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Reductions:       4.440e+02      1.00000
>>> 
>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
>>> 
>>>                         Max       Max/Min        Avg      Total
>>> Time (sec):           7.851e+00      1.00000   7.851e+00
>>> Objects:              2.000e+02      1.00000   2.000e+02
>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
>>> MPI Reductions:       1.046e+03      1.00000
>>> 
>>> I am not entirely sure if I can make sense out of that statistic but
>>> if there is something more you need, please feel free to let me know.
>>> 
>>> Vijay
>>> 
>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>> wrote:
>>>>> 
>>>>> Matt,
>>>>> 
>>>>> The -with-debugging=1 option is certainly not meant for performance
>>>>> studies but I didn't expect it to yield the same cpu time as a single
>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>>>> approximately the same amount of time for computation of solution. But
>>>>> I am currently configuring without debugging symbols and shall let you
>>>>> know what that yields.
>>>>> 
>>>>> On a similar note, is there something extra that needs to be done to
>>>>> make use of multi-core machines while using MPI ? I am not sure if
>>>>> this is even related to PETSc but could be an MPI configuration option
>>>>> that maybe either I or the configure process is missing. All ideas are
>>>>> much appreciated.
>>>> 
>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>>>> cheap multicore machines, there is a single memory bus, and thus using more
>>>> cores gains you very little extra performance. I still suspect you are not
>>>> actually
>>>> running in parallel, because you usually see a small speedup. That is why I
>>>> suggested looking at -log_summary since it tells you how many processes were
>>>> run and breaks down the time.
>>>>    Matt
>>>> 
>>>>> 
>>>>> Vijay
>>>>> 
>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>> eventhough the configure/make process went through without problems,
>>>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>>>> My configure options are
>>>>>>> 
>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>> 
>>>>>> 1) For performance studies, make a build using --with-debugging=0
>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>    Matt
>>>>>> 
>>>>>>> 
>>>>>>> Is there something else that needs to be done as part of the configure
>>>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>>>> 
>>>>>>> If there is something you've witnessed before in this configuration or
>>>>>>> if you need anything else to analyze the problem, do let me know.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vijay
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments
>>>>>> is infinitely more interesting than any results to which their
>>>>>> experiments
>>>>>> lead.
>>>>>> -- Norbert Wiener
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> What most experimenters take for granted before they begin their experiments
>>>> is infinitely more interesting than any results to which their experiments
>>>> lead.
>>>> -- Norbert Wiener
>>>> 
>> 
>> 
> <ex20.patch><ex20_np1.out><ex20_np2.out>


From balay at mcs.anl.gov  Wed Feb  2 18:53:50 2011
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 2 Feb 2011 18:53:50 -0600 (CST)
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1102021851340.22171@localhost6.localdomain6>

On Wed, 2 Feb 2011, Vijay S. Mahadevan wrote:

> On a similar note, is there something extra that needs to be done to
> make use of multi-core machines while using MPI ? I am not sure if
> this is even related to PETSc but could be an MPI configuration option
> that maybe either I or the configure process is missing. All ideas are
> much appreciated.

You can try '--download-mpich --download-mpich-device=ch3:nemesis' or
'--download-openmpi' both with --with-debugging=0 - and see if they
make any difference

[you can have a different PETSC_ARCH for each build - and then compare]

Satish

From vijay.m at gmail.com  Wed Feb  2 23:13:50 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Wed, 2 Feb 2011 23:13:50 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
Message-ID: <AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>

Barry,

I understand what you are saying but which example/options then is the
best one to compute the scalability in a multi-core machine ? I chose
the nonlinear diffusion problem specifically because of its inherent
stiffness that could lead probably provide noticeable scalability in a
multi-core system. From your experience, do you think there is another
example program that will demonstrate this much more rigorously or
clearly ? Btw, I dont get good speedup even for 2 processes with
ex20.c and that was the original motivation for this thread.

Satish. I configured with --download-mpich now without the
mpich-device. The results are given above. I will try with the options
you provided although I dont entirely understand what they mean, which
kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
?

Vijay

On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ? Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case)
>
> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307
> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0
>
> which is 74 percent of the total solve time (and 84 percent of the flops). ? When 3/4th of the entire run is not parallel at all you cannot expect much speedup. ?If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5.
>
> ?Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time.
>
> ?You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed.
>
> ?Barry
>
>
>
> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>
>> Barry,
>>
>> Please find attached the patch for the minor change to control the
>> number of elements from command line for snes/ex20.c. I know that this
>> can be achieved with -grid_x etc from command_line but thought this
>> just made the typing for the refinement process a little easier. I
>> apologize if there was any confusion.
>>
>> Also, find attached the full log summaries for -np=1 and -np=2. Thanks.
>>
>> Vijay
>>
>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> ?We need all the information from -log_summary to see what is going on.
>>>
>>> ?Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
>>>
>>> ? Barry
>>>
>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>
>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
>>>>
>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>>>>
>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
>>>>
>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>>>>
>>>> I am not entirely sure if I can make sense out of that statistic but
>>>> if there is something more you need, please feel free to let me know.
>>>>
>>>> Vijay
>>>>
>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Matt,
>>>>>>
>>>>>> The -with-debugging=1 option is certainly not meant for performance
>>>>>> studies but I didn't expect it to yield the same cpu time as a single
>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>>>>> approximately the same amount of time for computation of solution. But
>>>>>> I am currently configuring without debugging symbols and shall let you
>>>>>> know what that yields.
>>>>>>
>>>>>> On a similar note, is there something extra that needs to be done to
>>>>>> make use of multi-core machines while using MPI ? I am not sure if
>>>>>> this is even related to PETSc but could be an MPI configuration option
>>>>>> that maybe either I or the configure process is missing. All ideas are
>>>>>> much appreciated.
>>>>>
>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>>>>> cheap multicore machines, there is a single memory bus, and thus using more
>>>>> cores gains you very little extra performance. I still suspect you are not
>>>>> actually
>>>>> running in parallel, because you usually see a small speedup. That is why I
>>>>> suggested looking at -log_summary since it tells you how many processes were
>>>>> run and breaks down the time.
>>>>> ? ?Matt
>>>>>
>>>>>>
>>>>>> Vijay
>>>>>>
>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>> eventhough the configure/make process went through without problems,
>>>>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>>>>> My configure options are
>>>>>>>>
>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>
>>>>>>> 1) For performance studies, make a build using --with-debugging=0
>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>> ? ?Matt
>>>>>>>
>>>>>>>>
>>>>>>>> Is there something else that needs to be done as part of the configure
>>>>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>>>>>
>>>>>>>> If there is something you've witnessed before in this configuration or
>>>>>>>> if you need anything else to analyze the problem, do let me know.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vijay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments
>>>>>>> is infinitely more interesting than any results to which their
>>>>>>> experiments
>>>>>>> lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their experiments
>>>>> is infinitely more interesting than any results to which their experiments
>>>>> lead.
>>>>> -- Norbert Wiener
>>>>>
>>>
>>>
>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>
>

From knepley at gmail.com  Wed Feb  2 23:18:46 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 2 Feb 2011 23:18:46 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
Message-ID: <AANLkTimP53N5K8gmk=kEnOtGZx_RwD8LvLfx4ofE4KZp@mail.gmail.com>

On Wed, Feb 2, 2011 at 11:13 PM, Vijay S. Mahadevan <vijay.m at gmail.com>wrote:

> Barry,
>
> I understand what you are saying but which example/options then is the
> best one to compute the scalability in a multi-core machine ? I chose
> the nonlinear diffusion problem specifically because of its inherent
> stiffness that could lead probably provide noticeable scalability in a
> multi-core system. From your experience, do you think there is another
> example program that will demonstrate this much more rigorously or
> clearly ? Btw, I dont get good speedup even for 2 processes with
> ex20.c and that was the original motivation for this thread.
>

Very simply, Barry said your coarse grid is way too big. Make it smaller
and you will see speedup.

   Matt


> Satish. I configured with --download-mpich now without the
> mpich-device. The results are given above. I will try with the options
> you provided although I dont entirely understand what they mean, which
> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
> ?
>
> Vijay
>
> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Ok, everything makes sense. Looks like you are using two level
> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
> -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid
> problem redundantly on each process (each process is solving the entire
> coarse grid solve using LU factorization). The time for the factorization is
> (in the two process case)
> >
> > MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
> > MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
> >
> > which is 74 percent of the total solve time (and 84 percent of the
> flops).   When 3/4th of the entire run is not parallel at all you cannot
> expect much speedup.  If you run with -snes_view it will display exactly the
> solver being used. You cannot expect to understand the performance if you
> don't understand what the solver is actually doing. Using a 20 by 20 by 20
> coarse grid is generally a bad idea since the code spends most of the time
> there, stick with something like 5 by 5 by 5.
> >
> >  Suggest running with the default grid and -dmmg_nlevels 5 now the
> percent in the coarse solve will be a trivial percent of the run time.
> >
> >  You should get pretty good speed up for 2 processes but not much better
> speedup for four processes because as Matt noted the computation is memory
> bandwidth limited;
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers.
> Note also that this is running multigrid which is a fast solver, but doesn't
> parallel scale as well many slow algorithms. For example if you run
> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
> processors but crummy speed.
> >
> >  Barry
> >
> >
> >
> > On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
> >
> >> Barry,
> >>
> >> Please find attached the patch for the minor change to control the
> >> number of elements from command line for snes/ex20.c. I know that this
> >> can be achieved with -grid_x etc from command_line but thought this
> >> just made the typing for the refinement process a little easier. I
> >> apologize if there was any confusion.
> >>
> >> Also, find attached the full log summaries for -np=1 and -np=2. Thanks.
> >>
> >> Vijay
> >>
> >> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >>>
> >>>  We need all the information from -log_summary to see what is going on.
> >>>
> >>>  Not sure what -grid 20 means but don't expect any good parallel
> performance with less than at least 10,000 unknowns per process.
> >>>
> >>>   Barry
> >>>
> >>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
> >>>
> >>>> Here's the performance statistic on 1 and 2 processor runs.
> >>>>
> >>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
> -log_summary
> >>>>
> >>>>                         Max       Max/Min        Avg      Total
> >>>> Time (sec):           8.452e+00      1.00000   8.452e+00
> >>>> Objects:              1.470e+02      1.00000   1.470e+02
> >>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
> >>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
> >>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> >>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> >>>> MPI Reductions:       4.440e+02      1.00000
> >>>>
> >>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
> -log_summary
> >>>>
> >>>>                         Max       Max/Min        Avg      Total
> >>>> Time (sec):           7.851e+00      1.00000   7.851e+00
> >>>> Objects:              2.000e+02      1.00000   2.000e+02
> >>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
> >>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
> >>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
> >>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
> >>>> MPI Reductions:       1.046e+03      1.00000
> >>>>
> >>>> I am not entirely sure if I can make sense out of that statistic but
> >>>> if there is something more you need, please feel free to let me know.
> >>>>
> >>>> Vijay
> >>>>
> >>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <
> vijay.m at gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Matt,
> >>>>>>
> >>>>>> The -with-debugging=1 option is certainly not meant for performance
> >>>>>> studies but I didn't expect it to yield the same cpu time as a
> single
> >>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
> >>>>>> approximately the same amount of time for computation of solution.
> But
> >>>>>> I am currently configuring without debugging symbols and shall let
> you
> >>>>>> know what that yields.
> >>>>>>
> >>>>>> On a similar note, is there something extra that needs to be done to
> >>>>>> make use of multi-core machines while using MPI ? I am not sure if
> >>>>>> this is even related to PETSc but could be an MPI configuration
> option
> >>>>>> that maybe either I or the configure process is missing. All ideas
> are
> >>>>>> much appreciated.
> >>>>>
> >>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On
> most
> >>>>> cheap multicore machines, there is a single memory bus, and thus
> using more
> >>>>> cores gains you very little extra performance. I still suspect you
> are not
> >>>>> actually
> >>>>> running in parallel, because you usually see a small speedup. That is
> why I
> >>>>> suggested looking at -log_summary since it tells you how many
> processes were
> >>>>> run and breaks down the time.
> >>>>>    Matt
> >>>>>
> >>>>>>
> >>>>>> Vijay
> >>>>>>
> >>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <
> vijay.m at gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I am trying to configure my petsc install with an MPI installation
> to
> >>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
> >>>>>>>> eventhough the configure/make process went through without
> problems,
> >>>>>>>> the scalability of the programs don't seem to reflect what I
> expected.
> >>>>>>>> My configure options are
> >>>>>>>>
> >>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
> --download-mpich=1
> >>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
> >>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
> >>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
> >>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
> >>>>>>>> --with-debugging=1 --with-errorchecking=yes
> >>>>>>>
> >>>>>>> 1) For performance studies, make a build using --with-debugging=0
> >>>>>>> 2) Look at -log_summary for a breakdown of performance
> >>>>>>>    Matt
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Is there something else that needs to be done as part of the
> configure
> >>>>>>>> process to enable a decent scaling ? I am only comparing programs
> with
> >>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately
> the
> >>>>>>>> same time as noted from -log_summary. If it helps, I've been
> testing
> >>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
> >>>>>>>> -grid parameter from command-line to control the number of
> unknowns.
> >>>>>>>>
> >>>>>>>> If there is something you've witnessed before in this
> configuration or
> >>>>>>>> if you need anything else to analyze the problem, do let me know.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Vijay
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> What most experimenters take for granted before they begin their
> >>>>>>> experiments
> >>>>>>> is infinitely more interesting than any results to which their
> >>>>>>> experiments
> >>>>>>> lead.
> >>>>>>> -- Norbert Wiener
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> What most experimenters take for granted before they begin their
> experiments
> >>>>> is infinitely more interesting than any results to which their
> experiments
> >>>>> lead.
> >>>>> -- Norbert Wiener
> >>>>>
> >>>
> >>>
> >> <ex20.patch><ex20_np1.out><ex20_np2.out>
> >
> >
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110202/c6c33710/attachment-0001.htm>

From vkuhlem at emory.edu  Thu Feb  3 10:35:39 2011
From: vkuhlem at emory.edu (Verena Kuhlemann)
Date: Thu, 3 Feb 2011 11:35:39 -0500
Subject: [petsc-users] LU vs. ILU
Message-ID: <AANLkTikxKecLmHSvafZmC5g+Guu3j9n4V0PoqPRG9kaW@mail.gmail.com>

Hello,

I am somehow confused with the usage of PCLU and PCILU ind PETSc.
It seems as there are the same options to choose from for both.
In particular, if I use PCLU and PCFactorSetFill(pc,5)
don't I end up with an incomplete LU factorization?
If I use PCLU with no other options set, will the
factorization be complete?

Thanks,
Verena
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/3ce4e7df/attachment.htm>

From knepley at gmail.com  Thu Feb  3 10:45:43 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 3 Feb 2011 10:45:43 -0600
Subject: [petsc-users] LU vs. ILU
In-Reply-To: <AANLkTikxKecLmHSvafZmC5g+Guu3j9n4V0PoqPRG9kaW@mail.gmail.com>
References: <AANLkTikxKecLmHSvafZmC5g+Guu3j9n4V0PoqPRG9kaW@mail.gmail.com>
Message-ID: <AANLkTi=66HXkeW-TR+e3E-QJPND1PwzcFDYEmC4KDenC@mail.gmail.com>

On Thu, Feb 3, 2011 at 10:35 AM, Verena Kuhlemann <vkuhlem at emory.edu> wrote:

> Hello,
>
> I am somehow confused with the usage of PCLU and PCILU ind PETSc.
> It seems as there are the same options to choose from for both.
> In particular, if I use PCLU and PCFactorSetFill(pc,5)
> don't I end up with an incomplete LU factorization?
> If I use PCLU with no other options set, will the
> factorization be complete?
>

LU ignores the fill option. It always gives the complete factorization.

   Matt


> Thanks,
> Verena
>
-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/85f405b4/attachment.htm>

From hzhang at mcs.anl.gov  Thu Feb  3 10:58:32 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Thu, 3 Feb 2011 10:58:32 -0600
Subject: [petsc-users] LU vs. ILU
In-Reply-To: <AANLkTikxKecLmHSvafZmC5g+Guu3j9n4V0PoqPRG9kaW@mail.gmail.com>
References: <AANLkTikxKecLmHSvafZmC5g+Guu3j9n4V0PoqPRG9kaW@mail.gmail.com>
Message-ID: <AANLkTi=EobQtzRM5P1kw5qCSuUOg8uyM0f+PXcKjpmBf@mail.gmail.com>

Verena:

> I am somehow confused with the usage of PCLU and PCILU ind PETSc.
> It seems as there are the same options to choose from for both.
> In particular, if I use PCLU and?PCFactorSetFill(pc,5)
> don't I end up with an incomplete LU factorization?

No. When user-provided fill is not sufficient, petsc will
increases it to whatever LU requires and run LU factorization.
In this case, few or more malloc() will be called, which could be
expensive.

> If I use PCLU with no other options set, will the
> factorization be complete?

Yes, be complete. But providing a good estimates for options
will make computation efficient.

Hong

From knepley at gmail.com  Thu Feb  3 11:07:09 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 3 Feb 2011 11:07:09 -0600
Subject: [petsc-users] LU vs. ILU
In-Reply-To: <AANLkTi=66HXkeW-TR+e3E-QJPND1PwzcFDYEmC4KDenC@mail.gmail.com>
References: <AANLkTikxKecLmHSvafZmC5g+Guu3j9n4V0PoqPRG9kaW@mail.gmail.com>
	<AANLkTi=66HXkeW-TR+e3E-QJPND1PwzcFDYEmC4KDenC@mail.gmail.com>
Message-ID: <AANLkTimDvEQ8xWd0EET4nJ72RmENoPjQUDNQp4HAVZ1Y@mail.gmail.com>

On Thu, Feb 3, 2011 at 10:45 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Feb 3, 2011 at 10:35 AM, Verena Kuhlemann <vkuhlem at emory.edu>wrote:
>
>> Hello,
>>
>> I am somehow confused with the usage of PCLU and PCILU ind PETSc.
>> It seems as there are the same options to choose from for both.
>> In particular, if I use PCLU and PCFactorSetFill(pc,5)
>> don't I end up with an incomplete LU factorization?
>> If I use PCLU with no other options set, will the
>> factorization be complete?
>>
>
> LU ignores the fill option. It always gives the complete factorization.
>

Hong is right. I did not mean ignore, but rather will always keep
allocating.

   Matt


>    Matt
>
>
>> Thanks,
>> Verena
>>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/0a13477b/attachment.htm>

From bsmith at mcs.anl.gov  Thu Feb  3 11:10:07 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 11:10:07 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
Message-ID: <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>


On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:

> Barry,
> 
> I understand what you are saying but which example/options then is the
> best one to compute the scalability in a multi-core machine ? I chose
> the nonlinear diffusion problem specifically because of its inherent
> stiffness that could lead probably provide noticeable scalability in a
> multi-core system. From your experience, do you think there is another
> example program that will demonstrate this much more rigorously or
> clearly ? Btw, I dont get good speedup even for 2 processes with
> ex20.c and that was the original motivation for this thread.

   Did you follow my instructions?

   Barry

> 
> Satish. I configured with --download-mpich now without the
> mpich-device. The results are given above. I will try with the options
> you provided although I dont entirely understand what they mean, which
> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
> ?
> 
> Vijay
> 
> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>   Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case)
>> 
>> MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
>> MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
>> 
>> which is 74 percent of the total solve time (and 84 percent of the flops).   When 3/4th of the entire run is not parallel at all you cannot expect much speedup.  If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5.
>> 
>>  Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time.
>> 
>>  You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed.
>> 
>>  Barry
>> 
>> 
>> 
>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>> 
>>> Barry,
>>> 
>>> Please find attached the patch for the minor change to control the
>>> number of elements from command line for snes/ex20.c. I know that this
>>> can be achieved with -grid_x etc from command_line but thought this
>>> just made the typing for the refinement process a little easier. I
>>> apologize if there was any confusion.
>>> 
>>> Also, find attached the full log summaries for -np=1 and -np=2. Thanks.
>>> 
>>> Vijay
>>> 
>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>  We need all the information from -log_summary to see what is going on.
>>>> 
>>>>  Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
>>>> 
>>>>   Barry
>>>> 
>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>> 
>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>> 
>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
>>>>> 
>>>>>                         Max       Max/Min        Avg      Total
>>>>> Time (sec):           8.452e+00      1.00000   8.452e+00
>>>>> Objects:              1.470e+02      1.00000   1.470e+02
>>>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
>>>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>> MPI Reductions:       4.440e+02      1.00000
>>>>> 
>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
>>>>> 
>>>>>                         Max       Max/Min        Avg      Total
>>>>> Time (sec):           7.851e+00      1.00000   7.851e+00
>>>>> Objects:              2.000e+02      1.00000   2.000e+02
>>>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
>>>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
>>>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
>>>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
>>>>> MPI Reductions:       1.046e+03      1.00000
>>>>> 
>>>>> I am not entirely sure if I can make sense out of that statistic but
>>>>> if there is something more you need, please feel free to let me know.
>>>>> 
>>>>> Vijay
>>>>> 
>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Matt,
>>>>>>> 
>>>>>>> The -with-debugging=1 option is certainly not meant for performance
>>>>>>> studies but I didn't expect it to yield the same cpu time as a single
>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>>>>>> approximately the same amount of time for computation of solution. But
>>>>>>> I am currently configuring without debugging symbols and shall let you
>>>>>>> know what that yields.
>>>>>>> 
>>>>>>> On a similar note, is there something extra that needs to be done to
>>>>>>> make use of multi-core machines while using MPI ? I am not sure if
>>>>>>> this is even related to PETSc but could be an MPI configuration option
>>>>>>> that maybe either I or the configure process is missing. All ideas are
>>>>>>> much appreciated.
>>>>>> 
>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>>>>>> cheap multicore machines, there is a single memory bus, and thus using more
>>>>>> cores gains you very little extra performance. I still suspect you are not
>>>>>> actually
>>>>>> running in parallel, because you usually see a small speedup. That is why I
>>>>>> suggested looking at -log_summary since it tells you how many processes were
>>>>>> run and breaks down the time.
>>>>>>    Matt
>>>>>> 
>>>>>>> 
>>>>>>> Vijay
>>>>>>> 
>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>> eventhough the configure/make process went through without problems,
>>>>>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>>>>>> My configure options are
>>>>>>>>> 
>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>> 
>>>>>>>> 1) For performance studies, make a build using --with-debugging=0
>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>    Matt
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Is there something else that needs to be done as part of the configure
>>>>>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>>>>>> 
>>>>>>>>> If there is something you've witnessed before in this configuration or
>>>>>>>>> if you need anything else to analyze the problem, do let me know.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Vijay
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments
>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>> experiments
>>>>>>>> lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their experiments
>>>>>> is infinitely more interesting than any results to which their experiments
>>>>>> lead.
>>>>>> -- Norbert Wiener
>>>>>> 
>>>> 
>>>> 
>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>> 
>> 


From vijay.m at gmail.com  Thu Feb  3 11:37:33 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 3 Feb 2011 11:37:33 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
Message-ID: <AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>

Barry,

Sorry about the delay in the reply. I did not have access to the
system to test out what you said, until now.

I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
-log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5

processor       time
1                      114.2
2                      89.45
4                      81.01

The scaleup doesn't seem to be optimal, even with two processors. I am
wondering if the fault is in the MPI configuration itself. Are these
results as you would expect ? I can also send you the log_summary for
all cases if that will help.

Vijay

On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>
>> Barry,
>>
>> I understand what you are saying but which example/options then is the
>> best one to compute the scalability in a multi-core machine ? I chose
>> the nonlinear diffusion problem specifically because of its inherent
>> stiffness that could lead probably provide noticeable scalability in a
>> multi-core system. From your experience, do you think there is another
>> example program that will demonstrate this much more rigorously or
>> clearly ? Btw, I dont get good speedup even for 2 processes with
>> ex20.c and that was the original motivation for this thread.
>
> ? Did you follow my instructions?
>
> ? Barry
>
>>
>> Satish. I configured with --download-mpich now without the
>> mpich-device. The results are given above. I will try with the options
>> you provided although I dont entirely understand what they mean, which
>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>> ?
>>
>> Vijay
>>
>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> ? Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case)
>>>
>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307
>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0
>>>
>>> which is 74 percent of the total solve time (and 84 percent of the flops). ? When 3/4th of the entire run is not parallel at all you cannot expect much speedup. ?If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5.
>>>
>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time.
>>>
>>> ?You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed.
>>>
>>> ?Barry
>>>
>>>
>>>
>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Barry,
>>>>
>>>> Please find attached the patch for the minor change to control the
>>>> number of elements from command line for snes/ex20.c. I know that this
>>>> can be achieved with -grid_x etc from command_line but thought this
>>>> just made the typing for the refinement process a little easier. I
>>>> apologize if there was any confusion.
>>>>
>>>> Also, find attached the full log summaries for -np=1 and -np=2. Thanks.
>>>>
>>>> Vijay
>>>>
>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>> ?We need all the information from -log_summary to see what is going on.
>>>>>
>>>>> ?Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
>>>>>
>>>>> ? Barry
>>>>>
>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>
>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>
>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
>>>>>>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>>>>>>
>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
>>>>>>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>>>>>>
>>>>>> I am not entirely sure if I can make sense out of that statistic but
>>>>>> if there is something more you need, please feel free to let me know.
>>>>>>
>>>>>> Vijay
>>>>>>
>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Matt,
>>>>>>>>
>>>>>>>> The -with-debugging=1 option is certainly not meant for performance
>>>>>>>> studies but I didn't expect it to yield the same cpu time as a single
>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>>>>>>> approximately the same amount of time for computation of solution. But
>>>>>>>> I am currently configuring without debugging symbols and shall let you
>>>>>>>> know what that yields.
>>>>>>>>
>>>>>>>> On a similar note, is there something extra that needs to be done to
>>>>>>>> make use of multi-core machines while using MPI ? I am not sure if
>>>>>>>> this is even related to PETSc but could be an MPI configuration option
>>>>>>>> that maybe either I or the configure process is missing. All ideas are
>>>>>>>> much appreciated.
>>>>>>>
>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>>>>>>> cheap multicore machines, there is a single memory bus, and thus using more
>>>>>>> cores gains you very little extra performance. I still suspect you are not
>>>>>>> actually
>>>>>>> running in parallel, because you usually see a small speedup. That is why I
>>>>>>> suggested looking at -log_summary since it tells you how many processes were
>>>>>>> run and breaks down the time.
>>>>>>> ? ?Matt
>>>>>>>
>>>>>>>>
>>>>>>>> Vijay
>>>>>>>>
>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>> eventhough the configure/make process went through without problems,
>>>>>>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>>>>>>> My configure options are
>>>>>>>>>>
>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>
>>>>>>>>> 1) For performance studies, make a build using --with-debugging=0
>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>> ? ?Matt
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Is there something else that needs to be done as part of the configure
>>>>>>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>>>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>>>>>>>
>>>>>>>>>> If there is something you've witnessed before in this configuration or
>>>>>>>>>> if you need anything else to analyze the problem, do let me know.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vijay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>> experiments
>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>> experiments
>>>>>>>>> lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their experiments
>>>>>>> is infinitely more interesting than any results to which their experiments
>>>>>>> lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>
>>>>>
>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>
>>>
>
>

From knepley at gmail.com  Thu Feb  3 11:42:57 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 3 Feb 2011 11:42:57 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
Message-ID: <AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>

On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>wrote:

> Barry,
>
> Sorry about the delay in the reply. I did not have access to the
> system to test out what you said, until now.
>
> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>
> processor       time
> 1                      114.2
> 2                      89.45
> 4                      81.01
>

1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
this data.

2) Do you know the memory bandwidth characteristics of this machine? That is
crucial and
    you cannot begin to understand speedup on it until you do. Please look
this up.

3) Worrying about specifics of the MPI implementation makes no sense until
the basics are nailed down.

   Matt


> The scaleup doesn't seem to be optimal, even with two processors. I am
> wondering if the fault is in the MPI configuration itself. Are these
> results as you would expect ? I can also send you the log_summary for
> all cases if that will help.
>
> Vijay
>
> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
> >
> >> Barry,
> >>
> >> I understand what you are saying but which example/options then is the
> >> best one to compute the scalability in a multi-core machine ? I chose
> >> the nonlinear diffusion problem specifically because of its inherent
> >> stiffness that could lead probably provide noticeable scalability in a
> >> multi-core system. From your experience, do you think there is another
> >> example program that will demonstrate this much more rigorously or
> >> clearly ? Btw, I dont get good speedup even for 2 processes with
> >> ex20.c and that was the original motivation for this thread.
> >
> >   Did you follow my instructions?
> >
> >   Barry
> >
> >>
> >> Satish. I configured with --download-mpich now without the
> >> mpich-device. The results are given above. I will try with the options
> >> you provided although I dont entirely understand what they mean, which
> >> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
> >> ?
> >>
> >> Vijay
> >>
> >> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >>>
> >>>   Ok, everything makes sense. Looks like you are using two level
> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
> -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid
> problem redundantly on each process (each process is solving the entire
> coarse grid solve using LU factorization). The time for the factorization is
> (in the two process case)
> >>>
> >>> MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
> >>> MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
> >>>
> >>> which is 74 percent of the total solve time (and 84 percent of the
> flops).   When 3/4th of the entire run is not parallel at all you cannot
> expect much speedup.  If you run with -snes_view it will display exactly the
> solver being used. You cannot expect to understand the performance if you
> don't understand what the solver is actually doing. Using a 20 by 20 by 20
> coarse grid is generally a bad idea since the code spends most of the time
> there, stick with something like 5 by 5 by 5.
> >>>
> >>>  Suggest running with the default grid and -dmmg_nlevels 5 now the
> percent in the coarse solve will be a trivial percent of the run time.
> >>>
> >>>  You should get pretty good speed up for 2 processes but not much
> better speedup for four processes because as Matt noted the computation is
> memory bandwidth limited;
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers.
> Note also that this is running multigrid which is a fast solver, but doesn't
> parallel scale as well many slow algorithms. For example if you run
> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
> processors but crummy speed.
> >>>
> >>>  Barry
> >>>
> >>>
> >>>
> >>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
> >>>
> >>>> Barry,
> >>>>
> >>>> Please find attached the patch for the minor change to control the
> >>>> number of elements from command line for snes/ex20.c. I know that this
> >>>> can be achieved with -grid_x etc from command_line but thought this
> >>>> just made the typing for the refinement process a little easier. I
> >>>> apologize if there was any confusion.
> >>>>
> >>>> Also, find attached the full log summaries for -np=1 and -np=2.
> Thanks.
> >>>>
> >>>> Vijay
> >>>>
> >>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >>>>>
> >>>>>  We need all the information from -log_summary to see what is going
> on.
> >>>>>
> >>>>>  Not sure what -grid 20 means but don't expect any good parallel
> performance with less than at least 10,000 unknowns per process.
> >>>>>
> >>>>>   Barry
> >>>>>
> >>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
> >>>>>
> >>>>>> Here's the performance statistic on 1 and 2 processor runs.
> >>>>>>
> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
> -log_summary
> >>>>>>
> >>>>>>                         Max       Max/Min        Avg      Total
> >>>>>> Time (sec):           8.452e+00      1.00000   8.452e+00
> >>>>>> Objects:              1.470e+02      1.00000   1.470e+02
> >>>>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
> >>>>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
> >>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> >>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> >>>>>> MPI Reductions:       4.440e+02      1.00000
> >>>>>>
> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
> -log_summary
> >>>>>>
> >>>>>>                         Max       Max/Min        Avg      Total
> >>>>>> Time (sec):           7.851e+00      1.00000   7.851e+00
> >>>>>> Objects:              2.000e+02      1.00000   2.000e+02
> >>>>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
> >>>>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
> >>>>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
> >>>>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
> >>>>>> MPI Reductions:       1.046e+03      1.00000
> >>>>>>
> >>>>>> I am not entirely sure if I can make sense out of that statistic but
> >>>>>> if there is something more you need, please feel free to let me
> know.
> >>>>>>
> >>>>>> Vijay
> >>>>>>
> >>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <
> vijay.m at gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Matt,
> >>>>>>>>
> >>>>>>>> The -with-debugging=1 option is certainly not meant for
> performance
> >>>>>>>> studies but I didn't expect it to yield the same cpu time as a
> single
> >>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
> >>>>>>>> approximately the same amount of time for computation of solution.
> But
> >>>>>>>> I am currently configuring without debugging symbols and shall let
> you
> >>>>>>>> know what that yields.
> >>>>>>>>
> >>>>>>>> On a similar note, is there something extra that needs to be done
> to
> >>>>>>>> make use of multi-core machines while using MPI ? I am not sure if
> >>>>>>>> this is even related to PETSc but could be an MPI configuration
> option
> >>>>>>>> that maybe either I or the configure process is missing. All ideas
> are
> >>>>>>>> much appreciated.
> >>>>>>>
> >>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On
> most
> >>>>>>> cheap multicore machines, there is a single memory bus, and thus
> using more
> >>>>>>> cores gains you very little extra performance. I still suspect you
> are not
> >>>>>>> actually
> >>>>>>> running in parallel, because you usually see a small speedup. That
> is why I
> >>>>>>> suggested looking at -log_summary since it tells you how many
> processes were
> >>>>>>> run and breaks down the time.
> >>>>>>>    Matt
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Vijay
> >>>>>>>>
> >>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <
> knepley at gmail.com> wrote:
> >>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <
> vijay.m at gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I am trying to configure my petsc install with an MPI
> installation to
> >>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
> >>>>>>>>>> eventhough the configure/make process went through without
> problems,
> >>>>>>>>>> the scalability of the programs don't seem to reflect what I
> expected.
> >>>>>>>>>> My configure options are
> >>>>>>>>>>
> >>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
> --download-mpich=1
> >>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
> >>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
> --download-hypre=1
> >>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
> >>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
> >>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
> >>>>>>>>>
> >>>>>>>>> 1) For performance studies, make a build using --with-debugging=0
> >>>>>>>>> 2) Look at -log_summary for a breakdown of performance
> >>>>>>>>>    Matt
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Is there something else that needs to be done as part of the
> configure
> >>>>>>>>>> process to enable a decent scaling ? I am only comparing
> programs with
> >>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
> approximately the
> >>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
> testing
> >>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
> custom
> >>>>>>>>>> -grid parameter from command-line to control the number of
> unknowns.
> >>>>>>>>>>
> >>>>>>>>>> If there is something you've witnessed before in this
> configuration or
> >>>>>>>>>> if you need anything else to analyze the problem, do let me
> know.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Vijay
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> What most experimenters take for granted before they begin their
> >>>>>>>>> experiments
> >>>>>>>>> is infinitely more interesting than any results to which their
> >>>>>>>>> experiments
> >>>>>>>>> lead.
> >>>>>>>>> -- Norbert Wiener
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> What most experimenters take for granted before they begin their
> experiments
> >>>>>>> is infinitely more interesting than any results to which their
> experiments
> >>>>>>> lead.
> >>>>>>> -- Norbert Wiener
> >>>>>>>
> >>>>>
> >>>>>
> >>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
> >>>
> >>>
> >
> >
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/0b1789c0/attachment.htm>

From vijay.m at gmail.com  Thu Feb  3 12:05:15 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 3 Feb 2011 12:05:15 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
Message-ID: <AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>

Matt,

I apologize for the incomplete information. Find attached the
log_summary for all the cases.

The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
2x2GB/2x4GB configuration. I do not know how to decipher the memory
bandwidth with this information but if you need anything more, do let
me know.

VIjay

On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
> wrote:
>>
>> Barry,
>>
>> Sorry about the delay in the reply. I did not have access to the
>> system to test out what you said, until now.
>>
>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>
>> processor ? ? ? time
>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2
>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45
>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01
>
> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
> this data.
> 2) Do you know the memory bandwidth characteristics of this machine? That is
> crucial and
> ?? ?you cannot begin to understand speedup on it until you do. Please look
> this up.
> 3) Worrying about specifics of the MPI implementation makes no sense until
> the basics are nailed down.
> ?? Matt
>
>>
>> The scaleup doesn't seem to be optimal, even with two processors. I am
>> wondering if the fault is in the MPI configuration itself. Are these
>> results as you would expect ? I can also send you the log_summary for
>> all cases if that will help.
>>
>> Vijay
>>
>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> > On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>> >
>> >> Barry,
>> >>
>> >> I understand what you are saying but which example/options then is the
>> >> best one to compute the scalability in a multi-core machine ? I chose
>> >> the nonlinear diffusion problem specifically because of its inherent
>> >> stiffness that could lead probably provide noticeable scalability in a
>> >> multi-core system. From your experience, do you think there is another
>> >> example program that will demonstrate this much more rigorously or
>> >> clearly ? Btw, I dont get good speedup even for 2 processes with
>> >> ex20.c and that was the original motivation for this thread.
>> >
>> > ? Did you follow my instructions?
>> >
>> > ? Barry
>> >
>> >>
>> >> Satish. I configured with --download-mpich now without the
>> >> mpich-device. The results are given above. I will try with the options
>> >> you provided although I dont entirely understand what they mean, which
>> >> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>> >> ?
>> >>
>> >> Vijay
>> >>
>> >> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >>>
>> >>> ? Ok, everything makes sense. Looks like you are using two level
>> >>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>> >>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid
>> >>> problem redundantly on each process (each process is solving the entire
>> >>> coarse grid solve using LU factorization). The time for the factorization is
>> >>> (in the two process case)
>> >>>
>> >>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>> >>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307
>> >>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>> >>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0
>> >>>
>> >>> which is 74 percent of the total solve time (and 84 percent of the
>> >>> flops). ? When 3/4th of the entire run is not parallel at all you cannot
>> >>> expect much speedup. ?If you run with -snes_view it will display exactly the
>> >>> solver being used. You cannot expect to understand the performance if you
>> >>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>> >>> coarse grid is generally a bad idea since the code spends most of the time
>> >>> there, stick with something like 5 by 5 by 5.
>> >>>
>> >>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the
>> >>> percent in the coarse solve will be a trivial percent of the run time.
>> >>>
>> >>> ?You should get pretty good speed up for 2 processes but not much
>> >>> better speedup for four processes because as Matt noted the computation is
>> >>> memory bandwidth limited;
>> >>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>> >>> also that this is running multigrid which is a fast solver, but doesn't
>> >>> parallel scale as well many slow algorithms. For example if you run
>> >>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>> >>> processors but crummy speed.
>> >>>
>> >>> ?Barry
>> >>>
>> >>>
>> >>>
>> >>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>> >>>
>> >>>> Barry,
>> >>>>
>> >>>> Please find attached the patch for the minor change to control the
>> >>>> number of elements from command line for snes/ex20.c. I know that
>> >>>> this
>> >>>> can be achieved with -grid_x etc from command_line but thought this
>> >>>> just made the typing for the refinement process a little easier. I
>> >>>> apologize if there was any confusion.
>> >>>>
>> >>>> Also, find attached the full log summaries for -np=1 and -np=2.
>> >>>> Thanks.
>> >>>>
>> >>>> Vijay
>> >>>>
>> >>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>> >>>> wrote:
>> >>>>>
>> >>>>> ?We need all the information from -log_summary to see what is going
>> >>>>> on.
>> >>>>>
>> >>>>> ?Not sure what -grid 20 means but don't expect any good parallel
>> >>>>> performance with less than at least 10,000 unknowns per process.
>> >>>>>
>> >>>>> ? Barry
>> >>>>>
>> >>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>> >>>>>
>> >>>>>> Here's the performance statistic on 1 and 2 processor runs.
>> >>>>>>
>> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>> >>>>>> -log_summary
>> >>>>>>
>> >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>> >>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>> >>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>> >>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>> >>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>> >>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>> >>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>> >>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>> >>>>>>
>> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>> >>>>>> -log_summary
>> >>>>>>
>> >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>> >>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>> >>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>> >>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>> >>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>> >>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>> >>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>> >>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>> >>>>>>
>> >>>>>> I am not entirely sure if I can make sense out of that statistic
>> >>>>>> but
>> >>>>>> if there is something more you need, please feel free to let me
>> >>>>>> know.
>> >>>>>>
>> >>>>>> Vijay
>> >>>>>>
>> >>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>> >>>>>> wrote:
>> >>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>> >>>>>>> <vijay.m at gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Matt,
>> >>>>>>>>
>> >>>>>>>> The -with-debugging=1 option is certainly not meant for
>> >>>>>>>> performance
>> >>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>> >>>>>>>> single
>> >>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>> >>>>>>>> take
>> >>>>>>>> approximately the same amount of time for computation of
>> >>>>>>>> solution. But
>> >>>>>>>> I am currently configuring without debugging symbols and shall
>> >>>>>>>> let you
>> >>>>>>>> know what that yields.
>> >>>>>>>>
>> >>>>>>>> On a similar note, is there something extra that needs to be done
>> >>>>>>>> to
>> >>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>> >>>>>>>> if
>> >>>>>>>> this is even related to PETSc but could be an MPI configuration
>> >>>>>>>> option
>> >>>>>>>> that maybe either I or the configure process is missing. All
>> >>>>>>>> ideas are
>> >>>>>>>> much appreciated.
>> >>>>>>>
>> >>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>> >>>>>>> On most
>> >>>>>>> cheap multicore machines, there is a single memory bus, and thus
>> >>>>>>> using more
>> >>>>>>> cores gains you very little extra performance. I still suspect you
>> >>>>>>> are not
>> >>>>>>> actually
>> >>>>>>> running in parallel, because you usually see a small speedup. That
>> >>>>>>> is why I
>> >>>>>>> suggested looking at -log_summary since it tells you how many
>> >>>>>>> processes were
>> >>>>>>> run and breaks down the time.
>> >>>>>>> ? ?Matt
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Vijay
>> >>>>>>>>
>> >>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>> >>>>>>>> <knepley at gmail.com> wrote:
>> >>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>> >>>>>>>>> <vijay.m at gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Hi,
>> >>>>>>>>>>
>> >>>>>>>>>> I am trying to configure my petsc install with an MPI
>> >>>>>>>>>> installation to
>> >>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>> >>>>>>>>>> eventhough the configure/make process went through without
>> >>>>>>>>>> problems,
>> >>>>>>>>>> the scalability of the programs don't seem to reflect what I
>> >>>>>>>>>> expected.
>> >>>>>>>>>> My configure options are
>> >>>>>>>>>>
>> >>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>> >>>>>>>>>> --download-mpich=1
>> >>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>> >>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>> >>>>>>>>>> --download-hypre=1
>> >>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>> >>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>> >>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>> >>>>>>>>>
>> >>>>>>>>> 1) For performance studies, make a build using
>> >>>>>>>>> --with-debugging=0
>> >>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>> >>>>>>>>> ? ?Matt
>> >>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Is there something else that needs to be done as part of the
>> >>>>>>>>>> configure
>> >>>>>>>>>> process to enable a decent scaling ? I am only comparing
>> >>>>>>>>>> programs with
>> >>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>> >>>>>>>>>> approximately the
>> >>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>> >>>>>>>>>> testing
>> >>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>> >>>>>>>>>> custom
>> >>>>>>>>>> -grid parameter from command-line to control the number of
>> >>>>>>>>>> unknowns.
>> >>>>>>>>>>
>> >>>>>>>>>> If there is something you've witnessed before in this
>> >>>>>>>>>> configuration or
>> >>>>>>>>>> if you need anything else to analyze the problem, do let me
>> >>>>>>>>>> know.
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>> Vijay
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> What most experimenters take for granted before they begin their
>> >>>>>>>>> experiments
>> >>>>>>>>> is infinitely more interesting than any results to which their
>> >>>>>>>>> experiments
>> >>>>>>>>> lead.
>> >>>>>>>>> -- Norbert Wiener
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> What most experimenters take for granted before they begin their
>> >>>>>>> experiments
>> >>>>>>> is infinitely more interesting than any results to which their
>> >>>>>>> experiments
>> >>>>>>> lead.
>> >>>>>>> -- Norbert Wiener
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>> >>>
>> >>>
>> >
>> >
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex20_np1.out
Type: application/octet-stream
Size: 12365 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/5f8c5e2d/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex20_np2.out
Type: application/octet-stream
Size: 13469 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/5f8c5e2d/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex20_np4.out
Type: application/octet-stream
Size: 14749 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/5f8c5e2d/attachment-0005.obj>

From bsmith at mcs.anl.gov  Thu Feb  3 13:17:28 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 13:17:28 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
Message-ID: <C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>


  Vljay

   Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes

------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

  1 process
VecMAXPY            3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  29 40  0  0  0  1983

  2 processes
VecMAXPY            3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  31 40  0  0  0  2443

   The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23  which is terrible! Now why would it be so bad (remember you cannot blame MPI)

1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.

2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.

3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.

  In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).

   Barry


On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:

> Matt,
> 
> I apologize for the incomplete information. Find attached the
> log_summary for all the cases.
> 
> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
> bandwidth with this information but if you need anything more, do let
> me know.
> 
> VIjay
> 
> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>> wrote:
>>> 
>>> Barry,
>>> 
>>> Sorry about the delay in the reply. I did not have access to the
>>> system to test out what you said, until now.
>>> 
>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>> 
>>> processor       time
>>> 1                      114.2
>>> 2                      89.45
>>> 4                      81.01
>> 
>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>> this data.
>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>> crucial and
>>     you cannot begin to understand speedup on it until you do. Please look
>> this up.
>> 3) Worrying about specifics of the MPI implementation makes no sense until
>> the basics are nailed down.
>>    Matt
>> 
>>> 
>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>> wondering if the fault is in the MPI configuration itself. Are these
>>> results as you would expect ? I can also send you the log_summary for
>>> all cases if that will help.
>>> 
>>> Vijay
>>> 
>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>> 
>>>>> Barry,
>>>>> 
>>>>> I understand what you are saying but which example/options then is the
>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>> multi-core system. From your experience, do you think there is another
>>>>> example program that will demonstrate this much more rigorously or
>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>> ex20.c and that was the original motivation for this thread.
>>>> 
>>>>   Did you follow my instructions?
>>>> 
>>>>   Barry
>>>> 
>>>>> 
>>>>> Satish. I configured with --download-mpich now without the
>>>>> mpich-device. The results are given above. I will try with the options
>>>>> you provided although I dont entirely understand what they mean, which
>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>> ?
>>>>> 
>>>>> Vijay
>>>>> 
>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>> 
>>>>>>   Ok, everything makes sense. Looks like you are using two level
>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>> -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid
>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>> (in the two process case)
>>>>>> 
>>>>>> MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
>>>>>> MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
>>>>>> 
>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>> flops).   When 3/4th of the entire run is not parallel at all you cannot
>>>>>> expect much speedup.  If you run with -snes_view it will display exactly the
>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>> 
>>>>>>  Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>> 
>>>>>>  You should get pretty good speed up for 2 processes but not much
>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>> memory bandwidth limited;
>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>> processors but crummy speed.
>>>>>> 
>>>>>>  Barry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>> 
>>>>>>> Barry,
>>>>>>> 
>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>> this
>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>> apologize if there was any confusion.
>>>>>>> 
>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Vijay
>>>>>>> 
>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>  We need all the information from -log_summary to see what is going
>>>>>>>> on.
>>>>>>>> 
>>>>>>>>  Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>> 
>>>>>>>>   Barry
>>>>>>>> 
>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>> 
>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>> 
>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>> -log_summary
>>>>>>>>> 
>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>> Time (sec):           8.452e+00      1.00000   8.452e+00
>>>>>>>>> Objects:              1.470e+02      1.00000   1.470e+02
>>>>>>>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
>>>>>>>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
>>>>>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>> MPI Reductions:       4.440e+02      1.00000
>>>>>>>>> 
>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>> -log_summary
>>>>>>>>> 
>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>> Time (sec):           7.851e+00      1.00000   7.851e+00
>>>>>>>>> Objects:              2.000e+02      1.00000   2.000e+02
>>>>>>>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
>>>>>>>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
>>>>>>>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
>>>>>>>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
>>>>>>>>> MPI Reductions:       1.046e+03      1.00000
>>>>>>>>> 
>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>> but
>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>> know.
>>>>>>>>> 
>>>>>>>>> Vijay
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Matt,
>>>>>>>>>>> 
>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>> performance
>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>> single
>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>> take
>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>> solution. But
>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>> let you
>>>>>>>>>>> know what that yields.
>>>>>>>>>>> 
>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>> to
>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>> if
>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>> option
>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>> ideas are
>>>>>>>>>>> much appreciated.
>>>>>>>>>> 
>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>> On most
>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>> using more
>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>> are not
>>>>>>>>>> actually
>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>> is why I
>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>> processes were
>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>    Matt
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Vijay
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>> installation to
>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>> problems,
>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>> expected.
>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>    Matt
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>> configure
>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>> programs with
>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>> testing
>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>> custom
>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>> know.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vijay
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>> experiments
>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>> experiments
>>>>>>>>>>>> lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>> experiments
>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>> experiments
>>>>>>>>>> lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments
>> is infinitely more interesting than any results to which their experiments
>> lead.
>> -- Norbert Wiener
>> 
> <ex20_np1.out><ex20_np2.out><ex20_np4.out>


From jed at 59A2.org  Thu Feb  3 13:25:26 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 3 Feb 2011 16:25:26 -0300
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
Message-ID: <AANLkTimYLPO=1A6MW4UEZnDDxuX+2_4HHJCQfvZPZZGa@mail.gmail.com>

On Thu, Feb 3, 2011 at 16:17, Barry Smith <bsmith at mcs.anl.gov> wrote:

>  In src/benchmarks/streams you can run make test and have it generate a
> report of how the streams benchmark is able to utilize the memory bandwidth.
> Run that and send us the output (run with just 2 threads).


That test does no software prefetch, is not vectorized (look at the
assembly, you want all movapd and addpd/mulpd with memory addresses instead
of addsd/mulsd or addpd/mulpd operating only on register operands), and is
not NUMA-aware (which depending on the hardware, can cause performance
problems). The output is still relevant and indicates what can be done
without tuning, but does not accurately represent the peak achievable by the
hardware.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/4e21160f/attachment.htm>

From bsmith at mcs.anl.gov  Thu Feb  3 13:30:31 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 13:30:31 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTimYLPO=1A6MW4UEZnDDxuX+2_4HHJCQfvZPZZGa@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTimYLPO=1A6MW4UEZnDDxuX+2_4HHJCQfvZPZZGa@mail.gmail.com>
Message-ID: <9F199EA2-E304-49F5-8EBF-2068125A1378@mcs.anl.gov>


On Feb 3, 2011, at 1:25 PM, Jed Brown wrote:

> On Thu, Feb 3, 2011 at 16:17, Barry Smith <bsmith at mcs.anl.gov> wrote:
>  In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
> 
> That test does no software prefetch, is not vectorized (look at the assembly, you want all movapd and addpd/mulpd with memory addresses instead of addsd/mulsd or addpd/mulpd operating only on register operands), and is not NUMA-aware (which depending on the hardware, can cause performance problems). The output is still relevant and indicates what can be done without tuning, but does not accurately represent the peak achievable by the hardware.

  Completely true.  If you are aware of a "sophisticated" portable streams tester please add it to that directory. I'd love to have it.

  It gives an idea of what "code just compiled by the compiler can do" which is what we need in this situation, in particular what happens in going from 1 process to 2 processes.

  Barry


From jed at 59A2.org  Thu Feb  3 13:39:01 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 3 Feb 2011 16:39:01 -0300
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <9F199EA2-E304-49F5-8EBF-2068125A1378@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTimYLPO=1A6MW4UEZnDDxuX+2_4HHJCQfvZPZZGa@mail.gmail.com>
	<9F199EA2-E304-49F5-8EBF-2068125A1378@mcs.anl.gov>
Message-ID: <AANLkTinLQmWBBEt=UCwugHk_PsX7SpxvJj7YiVDvNgp0@mail.gmail.com>

On Thu, Feb 3, 2011 at 16:30, Barry Smith <bsmith at mcs.anl.gov> wrote:

> Completely true.  If you are aware of a "sophisticated" portable streams
> tester please add it to that directory. I'd love to have it.


Not portable, but I have some better code for x86/64. I believe Aron has
something good for Blue Gene.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/fdf6ef52/attachment.htm>

From vijay.m at gmail.com  Thu Feb  3 13:41:44 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 3 Feb 2011 13:41:44 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
Message-ID: <AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>

Barry,

Thanks for the quick reply. I ran the benchmark/stream/BasicVersion
for one and two processes and the output are as follows:

-n 1
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 50 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 2529 microseconds.
   (= 2529 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:       10161.8510       0.0032       0.0031       0.0037
Scale:       9843.6177       0.0034       0.0033       0.0038
Add:        10656.7114       0.0046       0.0045       0.0053
Triad:      10799.0448       0.0046       0.0044       0.0054

-n 2
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 50 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 4320 microseconds.
   (= 4320 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        5739.9704       0.0058       0.0056       0.0063
Scale:       5839.3617       0.0058       0.0055       0.0062
Add:         6116.9323       0.0081       0.0078       0.0085
Triad:       6021.0722       0.0084       0.0080       0.0088
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 50 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 2954 microseconds.
   (= 2954 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        6091.9448       0.0056       0.0053       0.0061
Scale:       5501.1775       0.0060       0.0058       0.0062
Add:         5960.4640       0.0084       0.0081       0.0087
Triad:       5936.2109       0.0083       0.0081       0.0089

I do not have OpenMP installed and so not sure if you wanted that when
you said two threads. I also closed most of the applications that were
open before running these tests and so they should hopefully be
accurate.

Vijay


On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ?Vljay
>
> ? Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes
>
> ------------------------------------------------------------------------------------------------------------------------
> Event ? ? ? ? ? ? ? ?Count ? ? ?Time (sec) ? ? Flops ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- Global --- ?--- Stage --- ? Total
> ? ? ? ? ? ? ? ? ? Max Ratio ?Max ? ? Ratio ? Max ?Ratio ?Mess ? Avg len Reduct ?%T %F %M %L %R ?%T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> ?1 process
> VecMAXPY ? ? ? ? ? ?3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?29 40 ?0 ?0 ?0 ?1983
>
> ?2 processes
> VecMAXPY ? ? ? ? ? ?3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?31 40 ?0 ?0 ?0 ?2443
>
> ? The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 ?which is terrible! Now why would it be so bad (remember you cannot blame MPI)
>
> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.
>
> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.
>
> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.
>
> ?In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
>
> ? Barry
>
>
> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:
>
>> Matt,
>>
>> I apologize for the incomplete information. Find attached the
>> log_summary for all the cases.
>>
>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
>> bandwidth with this information but if you need anything more, do let
>> me know.
>>
>> VIjay
>>
>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>> wrote:
>>>>
>>>> Barry,
>>>>
>>>> Sorry about the delay in the reply. I did not have access to the
>>>> system to test out what you said, until now.
>>>>
>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>>>
>>>> processor ? ? ? time
>>>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2
>>>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45
>>>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01
>>>
>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>>> this data.
>>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>>> crucial and
>>> ? ? you cannot begin to understand speedup on it until you do. Please look
>>> this up.
>>> 3) Worrying about specifics of the MPI implementation makes no sense until
>>> the basics are nailed down.
>>> ? ?Matt
>>>
>>>>
>>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>>> wondering if the fault is in the MPI configuration itself. Are these
>>>> results as you would expect ? I can also send you the log_summary for
>>>> all cases if that will help.
>>>>
>>>> Vijay
>>>>
>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>>>
>>>>>> Barry,
>>>>>>
>>>>>> I understand what you are saying but which example/options then is the
>>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>>> multi-core system. From your experience, do you think there is another
>>>>>> example program that will demonstrate this much more rigorously or
>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>>> ex20.c and that was the original motivation for this thread.
>>>>>
>>>>> ? Did you follow my instructions?
>>>>>
>>>>> ? Barry
>>>>>
>>>>>>
>>>>>> Satish. I configured with --download-mpich now without the
>>>>>> mpich-device. The results are given above. I will try with the options
>>>>>> you provided although I dont entirely understand what they mean, which
>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>>> ?
>>>>>>
>>>>>> Vijay
>>>>>>
>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>> ? Ok, everything makes sense. Looks like you are using two level
>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid
>>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>>> (in the two process case)
>>>>>>>
>>>>>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307
>>>>>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0
>>>>>>>
>>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>>> flops). ? When 3/4th of the entire run is not parallel at all you cannot
>>>>>>> expect much speedup. ?If you run with -snes_view it will display exactly the
>>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>>>
>>>>>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>>>
>>>>>>> ?You should get pretty good speed up for 2 processes but not much
>>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>>> memory bandwidth limited;
>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>>> processors but crummy speed.
>>>>>>>
>>>>>>> ?Barry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>>>
>>>>>>>> Barry,
>>>>>>>>
>>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>>> this
>>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>>> apologize if there was any confusion.
>>>>>>>>
>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Vijay
>>>>>>>>
>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> ?We need all the information from -log_summary to see what is going
>>>>>>>>> on.
>>>>>>>>>
>>>>>>>>> ?Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>>>
>>>>>>>>> ? Barry
>>>>>>>>>
>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>
>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>>>
>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>>> -log_summary
>>>>>>>>>>
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>>>>>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>>>>>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>>>>>>>>>>
>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>>> -log_summary
>>>>>>>>>>
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>>>>>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>>>>>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>>>>>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>>>>>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>>>>>>>>>>
>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>>> but
>>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>>> know.
>>>>>>>>>>
>>>>>>>>>> Vijay
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Matt,
>>>>>>>>>>>>
>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>>> performance
>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>>> single
>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>>> take
>>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>>> solution. But
>>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>>> let you
>>>>>>>>>>>> know what that yields.
>>>>>>>>>>>>
>>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>>> to
>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>>> if
>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>>> option
>>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>>> ideas are
>>>>>>>>>>>> much appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>>> On most
>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>>> using more
>>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>>> are not
>>>>>>>>>>> actually
>>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>>> is why I
>>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>>> processes were
>>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>> ? ?Matt
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Vijay
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>>> installation to
>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>> ? ?Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>>> programs with
>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>>> testing
>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>> lead.
>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>> experiments
>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>> experiments
>>>>>>>>>>> lead.
>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their experiments
>>> is infinitely more interesting than any results to which their experiments
>>> lead.
>>> -- Norbert Wiener
>>>
>> <ex20_np1.out><ex20_np2.out><ex20_np4.out>
>
>

From bsmith at mcs.anl.gov  Thu Feb  3 16:00:22 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 16:00:22 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
Message-ID: <C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>


   Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time.  

   I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BasicVersion.c
Type: application/octet-stream
Size: 5948 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/6717f938/attachment.obj>
-------------- next part --------------
; it is probably not a great way to do it, but better than nothing. Please try that one.

    Thanks


   Barry


On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote:

> Barry,
> 
> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion
> for one and two processes and the output are as follows:
> 
> -n 1
> -------------------------------------------------------------
> This system uses 8 bytes per DOUBLE PRECISION word.
> -------------------------------------------------------------
> Array size = 2000000, Offset = 0
> Total memory required = 45.8 MB.
> Each test is run 50 times, but only
> the *best* time for each is used.
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 2529 microseconds.
>   (= 2529 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function      Rate (MB/s)   RMS time     Min time     Max time
> Copy:       10161.8510       0.0032       0.0031       0.0037
> Scale:       9843.6177       0.0034       0.0033       0.0038
> Add:        10656.7114       0.0046       0.0045       0.0053
> Triad:      10799.0448       0.0046       0.0044       0.0054
> 
> -n 2
> -------------------------------------------------------------
> This system uses 8 bytes per DOUBLE PRECISION word.
> -------------------------------------------------------------
> Array size = 2000000, Offset = 0
> Total memory required = 45.8 MB.
> Each test is run 50 times, but only
> the *best* time for each is used.
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 4320 microseconds.
>   (= 4320 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function      Rate (MB/s)   RMS time     Min time     Max time
> Copy:        5739.9704       0.0058       0.0056       0.0063
> Scale:       5839.3617       0.0058       0.0055       0.0062
> Add:         6116.9323       0.0081       0.0078       0.0085
> Triad:       6021.0722       0.0084       0.0080       0.0088
> -------------------------------------------------------------
> This system uses 8 bytes per DOUBLE PRECISION word.
> -------------------------------------------------------------
> Array size = 2000000, Offset = 0
> Total memory required = 45.8 MB.
> Each test is run 50 times, but only
> the *best* time for each is used.
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 2954 microseconds.
>   (= 2954 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function      Rate (MB/s)   RMS time     Min time     Max time
> Copy:        6091.9448       0.0056       0.0053       0.0061
> Scale:       5501.1775       0.0060       0.0058       0.0062
> Add:         5960.4640       0.0084       0.0081       0.0087
> Triad:       5936.2109       0.0083       0.0081       0.0089
> 
> I do not have OpenMP installed and so not sure if you wanted that when
> you said two threads. I also closed most of the applications that were
> open before running these tests and so they should hopefully be
> accurate.
> 
> Vijay
> 
> 
> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  Vljay
>> 
>>   Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>>  1 process
>> VecMAXPY            3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  29 40  0  0  0  1983
>> 
>>  2 processes
>> VecMAXPY            3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  31 40  0  0  0  2443
>> 
>>   The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23  which is terrible! Now why would it be so bad (remember you cannot blame MPI)
>> 
>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.
>> 
>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.
>> 
>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.
>> 
>>  In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
>> 
>>   Barry
>> 
>> 
>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:
>> 
>>> Matt,
>>> 
>>> I apologize for the incomplete information. Find attached the
>>> log_summary for all the cases.
>>> 
>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
>>> bandwidth with this information but if you need anything more, do let
>>> me know.
>>> 
>>> VIjay
>>> 
>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>> wrote:
>>>>> 
>>>>> Barry,
>>>>> 
>>>>> Sorry about the delay in the reply. I did not have access to the
>>>>> system to test out what you said, until now.
>>>>> 
>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>>>> 
>>>>> processor       time
>>>>> 1                      114.2
>>>>> 2                      89.45
>>>>> 4                      81.01
>>>> 
>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>>>> this data.
>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>>>> crucial and
>>>>     you cannot begin to understand speedup on it until you do. Please look
>>>> this up.
>>>> 3) Worrying about specifics of the MPI implementation makes no sense until
>>>> the basics are nailed down.
>>>>    Matt
>>>> 
>>>>> 
>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>>>> wondering if the fault is in the MPI configuration itself. Are these
>>>>> results as you would expect ? I can also send you the log_summary for
>>>>> all cases if that will help.
>>>>> 
>>>>> Vijay
>>>>> 
>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>> 
>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>>>> 
>>>>>>> Barry,
>>>>>>> 
>>>>>>> I understand what you are saying but which example/options then is the
>>>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>>>> multi-core system. From your experience, do you think there is another
>>>>>>> example program that will demonstrate this much more rigorously or
>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>>>> ex20.c and that was the original motivation for this thread.
>>>>>> 
>>>>>>   Did you follow my instructions?
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>>> 
>>>>>>> Satish. I configured with --download-mpich now without the
>>>>>>> mpich-device. The results are given above. I will try with the options
>>>>>>> you provided although I dont entirely understand what they mean, which
>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>>>> ?
>>>>>>> 
>>>>>>> Vijay
>>>>>>> 
>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>> 
>>>>>>>>   Ok, everything makes sense. Looks like you are using two level
>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>>>> -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid
>>>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>>>> (in the two process case)
>>>>>>>> 
>>>>>>>> MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>>>> 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
>>>>>>>> MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>>>> 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
>>>>>>>> 
>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>>>> flops).   When 3/4th of the entire run is not parallel at all you cannot
>>>>>>>> expect much speedup.  If you run with -snes_view it will display exactly the
>>>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>>>> 
>>>>>>>>  Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>>>> 
>>>>>>>>  You should get pretty good speed up for 2 processes but not much
>>>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>>>> memory bandwidth limited;
>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>>>> processors but crummy speed.
>>>>>>>> 
>>>>>>>>  Barry
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>>>> 
>>>>>>>>> Barry,
>>>>>>>>> 
>>>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>>>> this
>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>>>> apologize if there was any confusion.
>>>>>>>>> 
>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>>>> Thanks.
>>>>>>>>> 
>>>>>>>>> Vijay
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>  We need all the information from -log_summary to see what is going
>>>>>>>>>> on.
>>>>>>>>>> 
>>>>>>>>>>  Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>>>> 
>>>>>>>>>>   Barry
>>>>>>>>>> 
>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>> 
>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>>>> 
>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>>>> -log_summary
>>>>>>>>>>> 
>>>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>>>> Time (sec):           8.452e+00      1.00000   8.452e+00
>>>>>>>>>>> Objects:              1.470e+02      1.00000   1.470e+02
>>>>>>>>>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
>>>>>>>>>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
>>>>>>>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>>>> MPI Reductions:       4.440e+02      1.00000
>>>>>>>>>>> 
>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>>>> -log_summary
>>>>>>>>>>> 
>>>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>>>> Time (sec):           7.851e+00      1.00000   7.851e+00
>>>>>>>>>>> Objects:              2.000e+02      1.00000   2.000e+02
>>>>>>>>>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
>>>>>>>>>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
>>>>>>>>>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
>>>>>>>>>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
>>>>>>>>>>> MPI Reductions:       1.046e+03      1.00000
>>>>>>>>>>> 
>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>>>> but
>>>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>>>> know.
>>>>>>>>>>> 
>>>>>>>>>>> Vijay
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Matt,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>>>> single
>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>>>> take
>>>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>>>> solution. But
>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>>>> let you
>>>>>>>>>>>>> know what that yields.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>>>> to
>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>>>> if
>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>>>> option
>>>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>>>> ideas are
>>>>>>>>>>>>> much appreciated.
>>>>>>>>>>>> 
>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>>>> On most
>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>>>> using more
>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>>>> are not
>>>>>>>>>>>> actually
>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>>>> is why I
>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>>>> processes were
>>>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>>>    Matt
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>>>> installation to
>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>>>    Matt
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>>>> programs with
>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>>>> testing
>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>> experiments
>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>> experiments
>>>>>>>>>>>> lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> What most experimenters take for granted before they begin their experiments
>>>> is infinitely more interesting than any results to which their experiments
>>>> lead.
>>>> -- Norbert Wiener
>>>> 
>>> <ex20_np1.out><ex20_np2.out><ex20_np4.out>
>> 
>> 


From vijay.m at gmail.com  Thu Feb  3 16:29:22 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 3 Feb 2011 16:29:22 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
Message-ID: <AANLkTikA0B=Bxv+xCUj5q6wEzpmJ5qivEiuv0Evx+_jC@mail.gmail.com>

Barry,

The outputs are attached. I do not see a big difference from the
earlier results as you mentioned.

Let me know if there exist a similar benchmark that might help.

Vijay

On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ? Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time.
>
> ? I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier()
> ; it is probably not a great way to do it, but better than nothing. Please try that one.
>
> ? ?Thanks
>
>
> ? Barry
>
>
> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote:
>
>> Barry,
>>
>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion
>> for one and two processes and the output are as follows:
>>
>> -n 1
>> -------------------------------------------------------------
>> This system uses 8 bytes per DOUBLE PRECISION word.
>> -------------------------------------------------------------
>> Array size = 2000000, Offset = 0
>> Total memory required = 45.8 MB.
>> Each test is run 50 times, but only
>> the *best* time for each is used.
>> -------------------------------------------------------------
>> Your clock granularity/precision appears to be 1 microseconds.
>> Each test below will take on the order of 2529 microseconds.
>> ? (= 2529 clock ticks)
>> Increase the size of the arrays if this shows that
>> you are not getting at least 20 clock ticks per test.
>> -------------------------------------------------------------
>> WARNING -- The above is only a rough guideline.
>> For best results, please be sure you know the
>> precision of your system timer.
>> -------------------------------------------------------------
>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time
>> Copy: ? ? ? 10161.8510 ? ? ? 0.0032 ? ? ? 0.0031 ? ? ? 0.0037
>> Scale: ? ? ? 9843.6177 ? ? ? 0.0034 ? ? ? 0.0033 ? ? ? 0.0038
>> Add: ? ? ? ?10656.7114 ? ? ? 0.0046 ? ? ? 0.0045 ? ? ? 0.0053
>> Triad: ? ? ?10799.0448 ? ? ? 0.0046 ? ? ? 0.0044 ? ? ? 0.0054
>>
>> -n 2
>> -------------------------------------------------------------
>> This system uses 8 bytes per DOUBLE PRECISION word.
>> -------------------------------------------------------------
>> Array size = 2000000, Offset = 0
>> Total memory required = 45.8 MB.
>> Each test is run 50 times, but only
>> the *best* time for each is used.
>> -------------------------------------------------------------
>> Your clock granularity/precision appears to be 1 microseconds.
>> Each test below will take on the order of 4320 microseconds.
>> ? (= 4320 clock ticks)
>> Increase the size of the arrays if this shows that
>> you are not getting at least 20 clock ticks per test.
>> -------------------------------------------------------------
>> WARNING -- The above is only a rough guideline.
>> For best results, please be sure you know the
>> precision of your system timer.
>> -------------------------------------------------------------
>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time
>> Copy: ? ? ? ?5739.9704 ? ? ? 0.0058 ? ? ? 0.0056 ? ? ? 0.0063
>> Scale: ? ? ? 5839.3617 ? ? ? 0.0058 ? ? ? 0.0055 ? ? ? 0.0062
>> Add: ? ? ? ? 6116.9323 ? ? ? 0.0081 ? ? ? 0.0078 ? ? ? 0.0085
>> Triad: ? ? ? 6021.0722 ? ? ? 0.0084 ? ? ? 0.0080 ? ? ? 0.0088
>> -------------------------------------------------------------
>> This system uses 8 bytes per DOUBLE PRECISION word.
>> -------------------------------------------------------------
>> Array size = 2000000, Offset = 0
>> Total memory required = 45.8 MB.
>> Each test is run 50 times, but only
>> the *best* time for each is used.
>> -------------------------------------------------------------
>> Your clock granularity/precision appears to be 1 microseconds.
>> Each test below will take on the order of 2954 microseconds.
>> ? (= 2954 clock ticks)
>> Increase the size of the arrays if this shows that
>> you are not getting at least 20 clock ticks per test.
>> -------------------------------------------------------------
>> WARNING -- The above is only a rough guideline.
>> For best results, please be sure you know the
>> precision of your system timer.
>> -------------------------------------------------------------
>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time
>> Copy: ? ? ? ?6091.9448 ? ? ? 0.0056 ? ? ? 0.0053 ? ? ? 0.0061
>> Scale: ? ? ? 5501.1775 ? ? ? 0.0060 ? ? ? 0.0058 ? ? ? 0.0062
>> Add: ? ? ? ? 5960.4640 ? ? ? 0.0084 ? ? ? 0.0081 ? ? ? 0.0087
>> Triad: ? ? ? 5936.2109 ? ? ? 0.0083 ? ? ? 0.0081 ? ? ? 0.0089
>>
>> I do not have OpenMP installed and so not sure if you wanted that when
>> you said two threads. I also closed most of the applications that were
>> open before running these tests and so they should hopefully be
>> accurate.
>>
>> Vijay
>>
>>
>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> ?Vljay
>>>
>>> ? Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event ? ? ? ? ? ? ? ?Count ? ? ?Time (sec) ? ? Flops ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- Global --- ?--- Stage --- ? Total
>>> ? ? ? ? ? ? ? ? ? Max Ratio ?Max ? ? Ratio ? Max ?Ratio ?Mess ? Avg len Reduct ?%T %F %M %L %R ?%T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> ?1 process
>>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?29 40 ?0 ?0 ?0 ?1983
>>>
>>> ?2 processes
>>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?31 40 ?0 ?0 ?0 ?2443
>>>
>>> ? The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 ?which is terrible! Now why would it be so bad (remember you cannot blame MPI)
>>>
>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.
>>>
>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.
>>>
>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.
>>>
>>> ?In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
>>>
>>> ? Barry
>>>
>>>
>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Matt,
>>>>
>>>> I apologize for the incomplete information. Find attached the
>>>> log_summary for all the cases.
>>>>
>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
>>>> bandwidth with this information but if you need anything more, do let
>>>> me know.
>>>>
>>>> VIjay
>>>>
>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Barry,
>>>>>>
>>>>>> Sorry about the delay in the reply. I did not have access to the
>>>>>> system to test out what you said, until now.
>>>>>>
>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>>>>>
>>>>>> processor ? ? ? time
>>>>>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2
>>>>>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45
>>>>>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01
>>>>>
>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>>>>> this data.
>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>>>>> crucial and
>>>>> ? ? you cannot begin to understand speedup on it until you do. Please look
>>>>> this up.
>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until
>>>>> the basics are nailed down.
>>>>> ? ?Matt
>>>>>
>>>>>>
>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>>>>> wondering if the fault is in the MPI configuration itself. Are these
>>>>>> results as you would expect ? I can also send you the log_summary for
>>>>>> all cases if that will help.
>>>>>>
>>>>>> Vijay
>>>>>>
>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>>>>>
>>>>>>>> Barry,
>>>>>>>>
>>>>>>>> I understand what you are saying but which example/options then is the
>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>>>>> multi-core system. From your experience, do you think there is another
>>>>>>>> example program that will demonstrate this much more rigorously or
>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>>>>> ex20.c and that was the original motivation for this thread.
>>>>>>>
>>>>>>> ? Did you follow my instructions?
>>>>>>>
>>>>>>> ? Barry
>>>>>>>
>>>>>>>>
>>>>>>>> Satish. I configured with --download-mpich now without the
>>>>>>>> mpich-device. The results are given above. I will try with the options
>>>>>>>> you provided although I dont entirely understand what they mean, which
>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>>>>> ?
>>>>>>>>
>>>>>>>> Vijay
>>>>>>>>
>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>> ? Ok, everything makes sense. Looks like you are using two level
>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>>>>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid
>>>>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>>>>> (in the two process case)
>>>>>>>>>
>>>>>>>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307
>>>>>>>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0
>>>>>>>>>
>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>>>>> flops). ? When 3/4th of the entire run is not parallel at all you cannot
>>>>>>>>> expect much speedup. ?If you run with -snes_view it will display exactly the
>>>>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>>>>>
>>>>>>>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>>>>>
>>>>>>>>> ?You should get pretty good speed up for 2 processes but not much
>>>>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>>>>> memory bandwidth limited;
>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>>>>> processors but crummy speed.
>>>>>>>>>
>>>>>>>>> ?Barry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>
>>>>>>>>>> Barry,
>>>>>>>>>>
>>>>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>>>>> this
>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>>>>> apologize if there was any confusion.
>>>>>>>>>>
>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Vijay
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> ?We need all the information from -log_summary to see what is going
>>>>>>>>>>> on.
>>>>>>>>>>>
>>>>>>>>>>> ?Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>>>>>
>>>>>>>>>>> ? Barry
>>>>>>>>>>>
>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>>>>>
>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>>>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>>>>>>>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>>>>>>>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>>>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>>>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>>>>>>>>>>>>
>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>>>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>>>>>>>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>>>>>>>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>>>>>>>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>>>>>>>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>>>>>>>>>>>>
>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>>>>> but
>>>>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>>>>> know.
>>>>>>>>>>>>
>>>>>>>>>>>> Vijay
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Matt,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>>>>> single
>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>>>>> take
>>>>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>>>>> solution. But
>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>>>>> let you
>>>>>>>>>>>>>> know what that yields.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>>>>> option
>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>>>>> ideas are
>>>>>>>>>>>>>> much appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>>>>> On most
>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>>>>> using more
>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>>>>> are not
>>>>>>>>>>>>> actually
>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>>>>> is why I
>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>>>>> processes were
>>>>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>>>> ? ?Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>>>>> installation to
>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>>>> ? ?Matt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>>>>> programs with
>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>>>>> testing
>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>> lead.
>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their experiments
>>>>> is infinitely more interesting than any results to which their experiments
>>>>> lead.
>>>>> -- Norbert Wiener
>>>>>
>>>> <ex20_np1.out><ex20_np2.out><ex20_np4.out>
>>>
>>>
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basicversion_np1.out
Type: application/octet-stream
Size: 999 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/f2bb0611/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basicversion_np2.out
Type: application/octet-stream
Size: 1999 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/f2bb0611/attachment-0003.obj>

From bsmith at mcs.anl.gov  Thu Feb  3 16:46:02 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 16:46:02 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTikA0B=Bxv+xCUj5q6wEzpmJ5qivEiuv0Evx+_jC@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
	<AANLkTikA0B=Bxv+xCUj5q6wEzpmJ 5qivEiuv0Evx+_jC@mail.gmail.com>
Message-ID: <DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>


   Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth.

   Barry

* perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely.

On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote:

> Barry,
> 
> The outputs are attached. I do not see a big difference from the
> earlier results as you mentioned.
> 
> Let me know if there exist a similar benchmark that might help.
> 
> Vijay
> 
> On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>   Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time.
>> 
>>   I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier()
>> ; it is probably not a great way to do it, but better than nothing. Please try that one.
>> 
>>    Thanks
>> 
>> 
>>   Barry
>> 
>> 
>> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote:
>> 
>>> Barry,
>>> 
>>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion
>>> for one and two processes and the output are as follows:
>>> 
>>> -n 1
>>> -------------------------------------------------------------
>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>> -------------------------------------------------------------
>>> Array size = 2000000, Offset = 0
>>> Total memory required = 45.8 MB.
>>> Each test is run 50 times, but only
>>> the *best* time for each is used.
>>> -------------------------------------------------------------
>>> Your clock granularity/precision appears to be 1 microseconds.
>>> Each test below will take on the order of 2529 microseconds.
>>>   (= 2529 clock ticks)
>>> Increase the size of the arrays if this shows that
>>> you are not getting at least 20 clock ticks per test.
>>> -------------------------------------------------------------
>>> WARNING -- The above is only a rough guideline.
>>> For best results, please be sure you know the
>>> precision of your system timer.
>>> -------------------------------------------------------------
>>> Function      Rate (MB/s)   RMS time     Min time     Max time
>>> Copy:       10161.8510       0.0032       0.0031       0.0037
>>> Scale:       9843.6177       0.0034       0.0033       0.0038
>>> Add:        10656.7114       0.0046       0.0045       0.0053
>>> Triad:      10799.0448       0.0046       0.0044       0.0054
>>> 
>>> -n 2
>>> -------------------------------------------------------------
>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>> -------------------------------------------------------------
>>> Array size = 2000000, Offset = 0
>>> Total memory required = 45.8 MB.
>>> Each test is run 50 times, but only
>>> the *best* time for each is used.
>>> -------------------------------------------------------------
>>> Your clock granularity/precision appears to be 1 microseconds.
>>> Each test below will take on the order of 4320 microseconds.
>>>   (= 4320 clock ticks)
>>> Increase the size of the arrays if this shows that
>>> you are not getting at least 20 clock ticks per test.
>>> -------------------------------------------------------------
>>> WARNING -- The above is only a rough guideline.
>>> For best results, please be sure you know the
>>> precision of your system timer.
>>> -------------------------------------------------------------
>>> Function      Rate (MB/s)   RMS time     Min time     Max time
>>> Copy:        5739.9704       0.0058       0.0056       0.0063
>>> Scale:       5839.3617       0.0058       0.0055       0.0062
>>> Add:         6116.9323       0.0081       0.0078       0.0085
>>> Triad:       6021.0722       0.0084       0.0080       0.0088
>>> -------------------------------------------------------------
>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>> -------------------------------------------------------------
>>> Array size = 2000000, Offset = 0
>>> Total memory required = 45.8 MB.
>>> Each test is run 50 times, but only
>>> the *best* time for each is used.
>>> -------------------------------------------------------------
>>> Your clock granularity/precision appears to be 1 microseconds.
>>> Each test below will take on the order of 2954 microseconds.
>>>   (= 2954 clock ticks)
>>> Increase the size of the arrays if this shows that
>>> you are not getting at least 20 clock ticks per test.
>>> -------------------------------------------------------------
>>> WARNING -- The above is only a rough guideline.
>>> For best results, please be sure you know the
>>> precision of your system timer.
>>> -------------------------------------------------------------
>>> Function      Rate (MB/s)   RMS time     Min time     Max time
>>> Copy:        6091.9448       0.0056       0.0053       0.0061
>>> Scale:       5501.1775       0.0060       0.0058       0.0062
>>> Add:         5960.4640       0.0084       0.0081       0.0087
>>> Triad:       5936.2109       0.0083       0.0081       0.0089
>>> 
>>> I do not have OpenMP installed and so not sure if you wanted that when
>>> you said two threads. I also closed most of the applications that were
>>> open before running these tests and so they should hopefully be
>>> accurate.
>>> 
>>> Vijay
>>> 
>>> 
>>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>  Vljay
>>>> 
>>>>   Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes
>>>> 
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> 
>>>>  1 process
>>>> VecMAXPY            3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  29 40  0  0  0  1983
>>>> 
>>>>  2 processes
>>>> VecMAXPY            3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  31 40  0  0  0  2443
>>>> 
>>>>   The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23  which is terrible! Now why would it be so bad (remember you cannot blame MPI)
>>>> 
>>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.
>>>> 
>>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.
>>>> 
>>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.
>>>> 
>>>>  In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:
>>>> 
>>>>> Matt,
>>>>> 
>>>>> I apologize for the incomplete information. Find attached the
>>>>> log_summary for all the cases.
>>>>> 
>>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
>>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
>>>>> bandwidth with this information but if you need anything more, do let
>>>>> me know.
>>>>> 
>>>>> VIjay
>>>>> 
>>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Barry,
>>>>>>> 
>>>>>>> Sorry about the delay in the reply. I did not have access to the
>>>>>>> system to test out what you said, until now.
>>>>>>> 
>>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>>>>>> 
>>>>>>> processor       time
>>>>>>> 1                      114.2
>>>>>>> 2                      89.45
>>>>>>> 4                      81.01
>>>>>> 
>>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>>>>>> this data.
>>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>>>>>> crucial and
>>>>>>     you cannot begin to understand speedup on it until you do. Please look
>>>>>> this up.
>>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until
>>>>>> the basics are nailed down.
>>>>>>    Matt
>>>>>> 
>>>>>>> 
>>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>>>>>> wondering if the fault is in the MPI configuration itself. Are these
>>>>>>> results as you would expect ? I can also send you the log_summary for
>>>>>>> all cases if that will help.
>>>>>>> 
>>>>>>> Vijay
>>>>>>> 
>>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>> 
>>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>>>>>> 
>>>>>>>>> Barry,
>>>>>>>>> 
>>>>>>>>> I understand what you are saying but which example/options then is the
>>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>>>>>> multi-core system. From your experience, do you think there is another
>>>>>>>>> example program that will demonstrate this much more rigorously or
>>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>>>>>> ex20.c and that was the original motivation for this thread.
>>>>>>>> 
>>>>>>>>   Did you follow my instructions?
>>>>>>>> 
>>>>>>>>   Barry
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Satish. I configured with --download-mpich now without the
>>>>>>>>> mpich-device. The results are given above. I will try with the options
>>>>>>>>> you provided although I dont entirely understand what they mean, which
>>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>>>>>> ?
>>>>>>>>> 
>>>>>>>>> Vijay
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>>> 
>>>>>>>>>>   Ok, everything makes sense. Looks like you are using two level
>>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>>>>>> -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid
>>>>>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>>>>>> (in the two process case)
>>>>>>>>>> 
>>>>>>>>>> MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>>>>>> 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
>>>>>>>>>> MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>>>>>> 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
>>>>>>>>>> 
>>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>>>>>> flops).   When 3/4th of the entire run is not parallel at all you cannot
>>>>>>>>>> expect much speedup.  If you run with -snes_view it will display exactly the
>>>>>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>>>>>> 
>>>>>>>>>>  Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>>>>>> 
>>>>>>>>>>  You should get pretty good speed up for 2 processes but not much
>>>>>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>>>>>> memory bandwidth limited;
>>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>>>>>> processors but crummy speed.
>>>>>>>>>> 
>>>>>>>>>>  Barry
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>> 
>>>>>>>>>>> Barry,
>>>>>>>>>>> 
>>>>>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>>>>>> this
>>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>>>>>> apologize if there was any confusion.
>>>>>>>>>>> 
>>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>>>>>> Thanks.
>>>>>>>>>>> 
>>>>>>>>>>> Vijay
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>  We need all the information from -log_summary to see what is going
>>>>>>>>>>>> on.
>>>>>>>>>>>> 
>>>>>>>>>>>>  Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>>>>>> 
>>>>>>>>>>>>   Barry
>>>>>>>>>>>> 
>>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>> 
>>>>>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>>>>>> Time (sec):           8.452e+00      1.00000   8.452e+00
>>>>>>>>>>>>> Objects:              1.470e+02      1.00000   1.470e+02
>>>>>>>>>>>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
>>>>>>>>>>>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
>>>>>>>>>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>>>>>> MPI Reductions:       4.440e+02      1.00000
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>> 
>>>>>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>>>>>> Time (sec):           7.851e+00      1.00000   7.851e+00
>>>>>>>>>>>>> Objects:              2.000e+02      1.00000   2.000e+02
>>>>>>>>>>>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
>>>>>>>>>>>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
>>>>>>>>>>>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
>>>>>>>>>>>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
>>>>>>>>>>>>> MPI Reductions:       1.046e+03      1.00000
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>>>>>> but
>>>>>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>>>>>> know.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Matt,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>>>>>> take
>>>>>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>>>>>> solution. But
>>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>>>>>> let you
>>>>>>>>>>>>>>> know what that yields.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>>>>>> ideas are
>>>>>>>>>>>>>>> much appreciated.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>>>>>> On most
>>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>>>>>> using more
>>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>>>>>> is why I
>>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>>>>>> processes were
>>>>>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>>>>>    Matt
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>>>>>> installation to
>>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>>>>>    Matt
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>>>>>> programs with
>>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>>>>>> testing
>>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their experiments
>>>>>> is infinitely more interesting than any results to which their experiments
>>>>>> lead.
>>>>>> -- Norbert Wiener
>>>>>> 
>>>>> <ex20_np1.out><ex20_np2.out><ex20_np4.out>
>>>> 
>>>> 
>> 
>> 
>> 
> <basicversion_np1.out><basicversion_np2.out>


From vijay.m at gmail.com  Thu Feb  3 17:31:04 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 3 Feb 2011 17:31:04 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
	<AANLkTikA0B=Bxv+xCUj5q6wEzpmJ5qivEiuv0Evx+_jC@mail.gmail.com>
	<DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
Message-ID: <AANLkTim1Ebh-OS1YfLgXzL4tnMuGLjrA5qD+qK7LtnN9@mail.gmail.com>

Barry,

That sucks. I am sure that it is not a single processor machine
although I've not yet opened it up and checked it for sure ;) It is
dual booted with windows and I am going to use the Intel performance
counters to find the bandwidth limit in windows/linux. Also, I did
find a benchmark for Ubuntu after bit of searching around and will try
to see if it can provide more details. Here are the links for the
benchmarks.

http://software.intel.com/en-us/articles/intel-performance-counter-monitor/
http://manpages.ubuntu.com/manpages/maverick/lmbench.8.html

Hopefully the numbers from Windows and Ubuntu will match and if not,
maybe my Ubuntu configuration needs a bit of tweaking to get this
correct. I will keep you updated if I find something interesting.
Thanks for all the helpful comments !

Vijay

On Thu, Feb 3, 2011 at 4:46 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ? Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth.
>
> ? Barry
>
> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely.
>
> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote:
>
>> Barry,
>>
>> The outputs are attached. I do not see a big difference from the
>> earlier results as you mentioned.
>>
>> Let me know if there exist a similar benchmark that might help.
>>
>> Vijay
>>
>> On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> ? Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time.
>>>
>>> ? I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier()
>>> ; it is probably not a great way to do it, but better than nothing. Please try that one.
>>>
>>> ? ?Thanks
>>>
>>>
>>> ? Barry
>>>
>>>
>>> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote:
>>>
>>>> Barry,
>>>>
>>>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion
>>>> for one and two processes and the output are as follows:
>>>>
>>>> -n 1
>>>> -------------------------------------------------------------
>>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>>> -------------------------------------------------------------
>>>> Array size = 2000000, Offset = 0
>>>> Total memory required = 45.8 MB.
>>>> Each test is run 50 times, but only
>>>> the *best* time for each is used.
>>>> -------------------------------------------------------------
>>>> Your clock granularity/precision appears to be 1 microseconds.
>>>> Each test below will take on the order of 2529 microseconds.
>>>> ? (= 2529 clock ticks)
>>>> Increase the size of the arrays if this shows that
>>>> you are not getting at least 20 clock ticks per test.
>>>> -------------------------------------------------------------
>>>> WARNING -- The above is only a rough guideline.
>>>> For best results, please be sure you know the
>>>> precision of your system timer.
>>>> -------------------------------------------------------------
>>>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time
>>>> Copy: ? ? ? 10161.8510 ? ? ? 0.0032 ? ? ? 0.0031 ? ? ? 0.0037
>>>> Scale: ? ? ? 9843.6177 ? ? ? 0.0034 ? ? ? 0.0033 ? ? ? 0.0038
>>>> Add: ? ? ? ?10656.7114 ? ? ? 0.0046 ? ? ? 0.0045 ? ? ? 0.0053
>>>> Triad: ? ? ?10799.0448 ? ? ? 0.0046 ? ? ? 0.0044 ? ? ? 0.0054
>>>>
>>>> -n 2
>>>> -------------------------------------------------------------
>>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>>> -------------------------------------------------------------
>>>> Array size = 2000000, Offset = 0
>>>> Total memory required = 45.8 MB.
>>>> Each test is run 50 times, but only
>>>> the *best* time for each is used.
>>>> -------------------------------------------------------------
>>>> Your clock granularity/precision appears to be 1 microseconds.
>>>> Each test below will take on the order of 4320 microseconds.
>>>> ? (= 4320 clock ticks)
>>>> Increase the size of the arrays if this shows that
>>>> you are not getting at least 20 clock ticks per test.
>>>> -------------------------------------------------------------
>>>> WARNING -- The above is only a rough guideline.
>>>> For best results, please be sure you know the
>>>> precision of your system timer.
>>>> -------------------------------------------------------------
>>>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time
>>>> Copy: ? ? ? ?5739.9704 ? ? ? 0.0058 ? ? ? 0.0056 ? ? ? 0.0063
>>>> Scale: ? ? ? 5839.3617 ? ? ? 0.0058 ? ? ? 0.0055 ? ? ? 0.0062
>>>> Add: ? ? ? ? 6116.9323 ? ? ? 0.0081 ? ? ? 0.0078 ? ? ? 0.0085
>>>> Triad: ? ? ? 6021.0722 ? ? ? 0.0084 ? ? ? 0.0080 ? ? ? 0.0088
>>>> -------------------------------------------------------------
>>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>>> -------------------------------------------------------------
>>>> Array size = 2000000, Offset = 0
>>>> Total memory required = 45.8 MB.
>>>> Each test is run 50 times, but only
>>>> the *best* time for each is used.
>>>> -------------------------------------------------------------
>>>> Your clock granularity/precision appears to be 1 microseconds.
>>>> Each test below will take on the order of 2954 microseconds.
>>>> ? (= 2954 clock ticks)
>>>> Increase the size of the arrays if this shows that
>>>> you are not getting at least 20 clock ticks per test.
>>>> -------------------------------------------------------------
>>>> WARNING -- The above is only a rough guideline.
>>>> For best results, please be sure you know the
>>>> precision of your system timer.
>>>> -------------------------------------------------------------
>>>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time
>>>> Copy: ? ? ? ?6091.9448 ? ? ? 0.0056 ? ? ? 0.0053 ? ? ? 0.0061
>>>> Scale: ? ? ? 5501.1775 ? ? ? 0.0060 ? ? ? 0.0058 ? ? ? 0.0062
>>>> Add: ? ? ? ? 5960.4640 ? ? ? 0.0084 ? ? ? 0.0081 ? ? ? 0.0087
>>>> Triad: ? ? ? 5936.2109 ? ? ? 0.0083 ? ? ? 0.0081 ? ? ? 0.0089
>>>>
>>>> I do not have OpenMP installed and so not sure if you wanted that when
>>>> you said two threads. I also closed most of the applications that were
>>>> open before running these tests and so they should hopefully be
>>>> accurate.
>>>>
>>>> Vijay
>>>>
>>>>
>>>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>> ?Vljay
>>>>>
>>>>> ? Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes
>>>>>
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> Event ? ? ? ? ? ? ? ?Count ? ? ?Time (sec) ? ? Flops ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- Global --- ?--- Stage --- ? Total
>>>>> ? ? ? ? ? ? ? ? ? Max Ratio ?Max ? ? Ratio ? Max ?Ratio ?Mess ? Avg len Reduct ?%T %F %M %L %R ?%T %F %M %L %R Mflop/s
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> ?1 process
>>>>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?29 40 ?0 ?0 ?0 ?1983
>>>>>
>>>>> ?2 processes
>>>>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?31 40 ?0 ?0 ?0 ?2443
>>>>>
>>>>> ? The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 ?which is terrible! Now why would it be so bad (remember you cannot blame MPI)
>>>>>
>>>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.
>>>>>
>>>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.
>>>>>
>>>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.
>>>>>
>>>>> ?In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
>>>>>
>>>>> ? Barry
>>>>>
>>>>>
>>>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:
>>>>>
>>>>>> Matt,
>>>>>>
>>>>>> I apologize for the incomplete information. Find attached the
>>>>>> log_summary for all the cases.
>>>>>>
>>>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
>>>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
>>>>>> bandwidth with this information but if you need anything more, do let
>>>>>> me know.
>>>>>>
>>>>>> VIjay
>>>>>>
>>>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Barry,
>>>>>>>>
>>>>>>>> Sorry about the delay in the reply. I did not have access to the
>>>>>>>> system to test out what you said, until now.
>>>>>>>>
>>>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>>>>>>>
>>>>>>>> processor ? ? ? time
>>>>>>>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2
>>>>>>>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45
>>>>>>>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01
>>>>>>>
>>>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>>>>>>> this data.
>>>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>>>>>>> crucial and
>>>>>>> ? ? you cannot begin to understand speedup on it until you do. Please look
>>>>>>> this up.
>>>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until
>>>>>>> the basics are nailed down.
>>>>>>> ? ?Matt
>>>>>>>
>>>>>>>>
>>>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>>>>>>> wondering if the fault is in the MPI configuration itself. Are these
>>>>>>>> results as you would expect ? I can also send you the log_summary for
>>>>>>>> all cases if that will help.
>>>>>>>>
>>>>>>>> Vijay
>>>>>>>>
>>>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>
>>>>>>>>>> Barry,
>>>>>>>>>>
>>>>>>>>>> I understand what you are saying but which example/options then is the
>>>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>>>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>>>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>>>>>>> multi-core system. From your experience, do you think there is another
>>>>>>>>>> example program that will demonstrate this much more rigorously or
>>>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>>>>>>> ex20.c and that was the original motivation for this thread.
>>>>>>>>>
>>>>>>>>> ? Did you follow my instructions?
>>>>>>>>>
>>>>>>>>> ? Barry
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Satish. I configured with --download-mpich now without the
>>>>>>>>>> mpich-device. The results are given above. I will try with the options
>>>>>>>>>> you provided although I dont entirely understand what they mean, which
>>>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> Vijay
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>> ? Ok, everything makes sense. Looks like you are using two level
>>>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>>>>>>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid
>>>>>>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>>>>>>> (in the two process case)
>>>>>>>>>>>
>>>>>>>>>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307
>>>>>>>>>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0
>>>>>>>>>>>
>>>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>>>>>>> flops). ? When 3/4th of the entire run is not parallel at all you cannot
>>>>>>>>>>> expect much speedup. ?If you run with -snes_view it will display exactly the
>>>>>>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>>>>>>>
>>>>>>>>>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>>>>>>>
>>>>>>>>>>> ?You should get pretty good speed up for 2 processes but not much
>>>>>>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>>>>>>> memory bandwidth limited;
>>>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>>>>>>> processors but crummy speed.
>>>>>>>>>>>
>>>>>>>>>>> ?Barry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Barry,
>>>>>>>>>>>>
>>>>>>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>>>>>>> this
>>>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>>>>>>> apologize if there was any confusion.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Vijay
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ?We need all the information from -log_summary to see what is going
>>>>>>>>>>>>> on.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ?Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ? Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>>>>>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00
>>>>>>>>>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02
>>>>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09
>>>>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08
>>>>>>>>>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>>>>>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00
>>>>>>>>>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total
>>>>>>>>>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00
>>>>>>>>>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02
>>>>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09
>>>>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09
>>>>>>>>>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03
>>>>>>>>>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07
>>>>>>>>>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Matt,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>>>>>>> take
>>>>>>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>>>>>>> solution. But
>>>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>>>>>>> let you
>>>>>>>>>>>>>>>> know what that yields.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>>>>>>> ideas are
>>>>>>>>>>>>>>>> much appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>>>>>>> On most
>>>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>>>>>>> using more
>>>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>>>>>>> is why I
>>>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>>>>>>> processes were
>>>>>>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>>>>>> ? ?Matt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>>>>>>> installation to
>>>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>>>>>> ? ?Matt
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>>>>>>> programs with
>>>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>>>>>>> testing
>>>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their experiments
>>>>>>> is infinitely more interesting than any results to which their experiments
>>>>>>> lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>> <ex20_np1.out><ex20_np2.out><ex20_np4.out>
>>>>>
>>>>>
>>>
>>>
>>>
>> <basicversion_np1.out><basicversion_np2.out>
>
>

From bsmith at mcs.anl.gov  Thu Feb  3 17:57:30 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 17:57:30 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTim1Ebh-OS1YfLgXzL4tnMuGLjrA5qD+qK7LtnN9@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
	<AANLkTikA0B=Bxv+xCUj5q6wEzpmJ 5qivEiuv0Evx+_jC@mail.gmail.com>
	<DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
	<AANLkTim1Ebh-OS1YfLgXzL4tnMuGLjrA5qD+qK7LtnN9@mail.gmail.com>
Message-ID: <CA3BB0FD-56BC-4197-B81D-064031D21E56@mcs.anl.gov>


On Feb 3, 2011, at 5:31 PM, Vijay S. Mahadevan wrote:

> Barry,
> 
> That sucks. I am sure that it is not a single processor machine
> although I've not yet opened it up and checked it for sure ;)

  I didn't mean that it was literally a single processor machine, just effectively for iterative linear solvers.
   Barry

> It is
> dual booted with windows and I am going to use the Intel performance
> counters to find the bandwidth limit in windows/linux. Also, I did
> find a benchmark for Ubuntu after bit of searching around and will try
> to see if it can provide more details. Here are the links for the
> benchmarks.
> 
> http://software.intel.com/en-us/articles/intel-performance-counter-monitor/
> http://manpages.ubuntu.com/manpages/maverick/lmbench.8.html
> 
> Hopefully the numbers from Windows and Ubuntu will match and if not,
> maybe my Ubuntu configuration needs a bit of tweaking to get this
> correct. I will keep you updated if I find something interesting.
> Thanks for all the helpful comments !
> 
> Vijay
> 
> On Thu, Feb 3, 2011 at 4:46 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>   Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth.
>> 
>>   Barry
>> 
>> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely.
>> 
>> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote:
>> 
>>> Barry,
>>> 
>>> The outputs are attached. I do not see a big difference from the
>>> earlier results as you mentioned.
>>> 
>>> Let me know if there exist a similar benchmark that might help.
>>> 
>>> Vijay
>>> 
>>> On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>   Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time.
>>>> 
>>>>   I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier()
>>>> ; it is probably not a great way to do it, but better than nothing. Please try that one.
>>>> 
>>>>    Thanks
>>>> 
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote:
>>>> 
>>>>> Barry,
>>>>> 
>>>>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion
>>>>> for one and two processes and the output are as follows:
>>>>> 
>>>>> -n 1
>>>>> -------------------------------------------------------------
>>>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>>>> -------------------------------------------------------------
>>>>> Array size = 2000000, Offset = 0
>>>>> Total memory required = 45.8 MB.
>>>>> Each test is run 50 times, but only
>>>>> the *best* time for each is used.
>>>>> -------------------------------------------------------------
>>>>> Your clock granularity/precision appears to be 1 microseconds.
>>>>> Each test below will take on the order of 2529 microseconds.
>>>>>   (= 2529 clock ticks)
>>>>> Increase the size of the arrays if this shows that
>>>>> you are not getting at least 20 clock ticks per test.
>>>>> -------------------------------------------------------------
>>>>> WARNING -- The above is only a rough guideline.
>>>>> For best results, please be sure you know the
>>>>> precision of your system timer.
>>>>> -------------------------------------------------------------
>>>>> Function      Rate (MB/s)   RMS time     Min time     Max time
>>>>> Copy:       10161.8510       0.0032       0.0031       0.0037
>>>>> Scale:       9843.6177       0.0034       0.0033       0.0038
>>>>> Add:        10656.7114       0.0046       0.0045       0.0053
>>>>> Triad:      10799.0448       0.0046       0.0044       0.0054
>>>>> 
>>>>> -n 2
>>>>> -------------------------------------------------------------
>>>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>>>> -------------------------------------------------------------
>>>>> Array size = 2000000, Offset = 0
>>>>> Total memory required = 45.8 MB.
>>>>> Each test is run 50 times, but only
>>>>> the *best* time for each is used.
>>>>> -------------------------------------------------------------
>>>>> Your clock granularity/precision appears to be 1 microseconds.
>>>>> Each test below will take on the order of 4320 microseconds.
>>>>>   (= 4320 clock ticks)
>>>>> Increase the size of the arrays if this shows that
>>>>> you are not getting at least 20 clock ticks per test.
>>>>> -------------------------------------------------------------
>>>>> WARNING -- The above is only a rough guideline.
>>>>> For best results, please be sure you know the
>>>>> precision of your system timer.
>>>>> -------------------------------------------------------------
>>>>> Function      Rate (MB/s)   RMS time     Min time     Max time
>>>>> Copy:        5739.9704       0.0058       0.0056       0.0063
>>>>> Scale:       5839.3617       0.0058       0.0055       0.0062
>>>>> Add:         6116.9323       0.0081       0.0078       0.0085
>>>>> Triad:       6021.0722       0.0084       0.0080       0.0088
>>>>> -------------------------------------------------------------
>>>>> This system uses 8 bytes per DOUBLE PRECISION word.
>>>>> -------------------------------------------------------------
>>>>> Array size = 2000000, Offset = 0
>>>>> Total memory required = 45.8 MB.
>>>>> Each test is run 50 times, but only
>>>>> the *best* time for each is used.
>>>>> -------------------------------------------------------------
>>>>> Your clock granularity/precision appears to be 1 microseconds.
>>>>> Each test below will take on the order of 2954 microseconds.
>>>>>   (= 2954 clock ticks)
>>>>> Increase the size of the arrays if this shows that
>>>>> you are not getting at least 20 clock ticks per test.
>>>>> -------------------------------------------------------------
>>>>> WARNING -- The above is only a rough guideline.
>>>>> For best results, please be sure you know the
>>>>> precision of your system timer.
>>>>> -------------------------------------------------------------
>>>>> Function      Rate (MB/s)   RMS time     Min time     Max time
>>>>> Copy:        6091.9448       0.0056       0.0053       0.0061
>>>>> Scale:       5501.1775       0.0060       0.0058       0.0062
>>>>> Add:         5960.4640       0.0084       0.0081       0.0087
>>>>> Triad:       5936.2109       0.0083       0.0081       0.0089
>>>>> 
>>>>> I do not have OpenMP installed and so not sure if you wanted that when
>>>>> you said two threads. I also closed most of the applications that were
>>>>> open before running these tests and so they should hopefully be
>>>>> accurate.
>>>>> 
>>>>> Vijay
>>>>> 
>>>>> 
>>>>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>> 
>>>>>>  Vljay
>>>>>> 
>>>>>>   Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes
>>>>>> 
>>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>>>>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>>> 
>>>>>>  1 process
>>>>>> VecMAXPY            3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  29 40  0  0  0  1983
>>>>>> 
>>>>>>  2 processes
>>>>>> VecMAXPY            3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20  0  0  0  31 40  0  0  0  2443
>>>>>> 
>>>>>>   The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23  which is terrible! Now why would it be so bad (remember you cannot blame MPI)
>>>>>> 
>>>>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time.
>>>>>> 
>>>>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers.
>>>>>> 
>>>>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth.
>>>>>> 
>>>>>>  In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>> 
>>>>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote:
>>>>>> 
>>>>>>> Matt,
>>>>>>> 
>>>>>>> I apologize for the incomplete information. Find attached the
>>>>>>> log_summary for all the cases.
>>>>>>> 
>>>>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with
>>>>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory
>>>>>>> bandwidth with this information but if you need anything more, do let
>>>>>>> me know.
>>>>>>> 
>>>>>>> VIjay
>>>>>>> 
>>>>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Barry,
>>>>>>>>> 
>>>>>>>>> Sorry about the delay in the reply. I did not have access to the
>>>>>>>>> system to test out what you said, until now.
>>>>>>>>> 
>>>>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20
>>>>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5
>>>>>>>>> 
>>>>>>>>> processor       time
>>>>>>>>> 1                      114.2
>>>>>>>>> 2                      89.45
>>>>>>>>> 4                      81.01
>>>>>>>> 
>>>>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from
>>>>>>>> this data.
>>>>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is
>>>>>>>> crucial and
>>>>>>>>     you cannot begin to understand speedup on it until you do. Please look
>>>>>>>> this up.
>>>>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until
>>>>>>>> the basics are nailed down.
>>>>>>>>    Matt
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am
>>>>>>>>> wondering if the fault is in the MPI configuration itself. Are these
>>>>>>>>> results as you would expect ? I can also send you the log_summary for
>>>>>>>>> all cases if that will help.
>>>>>>>>> 
>>>>>>>>> Vijay
>>>>>>>>> 
>>>>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>> 
>>>>>>>>>>> Barry,
>>>>>>>>>>> 
>>>>>>>>>>> I understand what you are saying but which example/options then is the
>>>>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose
>>>>>>>>>>> the nonlinear diffusion problem specifically because of its inherent
>>>>>>>>>>> stiffness that could lead probably provide noticeable scalability in a
>>>>>>>>>>> multi-core system. From your experience, do you think there is another
>>>>>>>>>>> example program that will demonstrate this much more rigorously or
>>>>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with
>>>>>>>>>>> ex20.c and that was the original motivation for this thread.
>>>>>>>>>> 
>>>>>>>>>>   Did you follow my instructions?
>>>>>>>>>> 
>>>>>>>>>>   Barry
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Satish. I configured with --download-mpich now without the
>>>>>>>>>>> mpich-device. The results are given above. I will try with the options
>>>>>>>>>>> you provided although I dont entirely understand what they mean, which
>>>>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu
>>>>>>>>>>> ?
>>>>>>>>>>> 
>>>>>>>>>>> Vijay
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>   Ok, everything makes sense. Looks like you are using two level
>>>>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant
>>>>>>>>>>>> -mg_coarse_redundant_pc_type lu  This means it is solving the coarse grid
>>>>>>>>>>>> problem redundantly on each process (each process is solving the entire
>>>>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is
>>>>>>>>>>>> (in the two process case)
>>>>>>>>>>>> 
>>>>>>>>>>>> MatLUFactorNum        14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00
>>>>>>>>>>>> 0.0e+00 0.0e+00 37 41  0  0  0  74 82  0  0  0  1307
>>>>>>>>>>>> MatILUFactorSym        7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00
>>>>>>>>>>>> 0.0e+00 7.0e+00  0  0  0  0  1   0  0  0  0  2     0
>>>>>>>>>>>> 
>>>>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the
>>>>>>>>>>>> flops).   When 3/4th of the entire run is not parallel at all you cannot
>>>>>>>>>>>> expect much speedup.  If you run with -snes_view it will display exactly the
>>>>>>>>>>>> solver being used. You cannot expect to understand the performance if you
>>>>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20
>>>>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time
>>>>>>>>>>>> there, stick with something like 5 by 5 by 5.
>>>>>>>>>>>> 
>>>>>>>>>>>>  Suggest running with the default grid and -dmmg_nlevels 5 now the
>>>>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time.
>>>>>>>>>>>> 
>>>>>>>>>>>>  You should get pretty good speed up for 2 processes but not much
>>>>>>>>>>>> better speedup for four processes because as Matt noted the computation is
>>>>>>>>>>>> memory bandwidth limited;
>>>>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note
>>>>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't
>>>>>>>>>>>> parallel scale as well many slow algorithms. For example if you run
>>>>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2
>>>>>>>>>>>> processors but crummy speed.
>>>>>>>>>>>> 
>>>>>>>>>>>>  Barry
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Barry,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please find attached the patch for the minor change to control the
>>>>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that
>>>>>>>>>>>>> this
>>>>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this
>>>>>>>>>>>>> just made the typing for the refinement process a little easier. I
>>>>>>>>>>>>> apologize if there was any confusion.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2.
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  We need all the information from -log_summary to see what is going
>>>>>>>>>>>>>> on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  Not sure what -grid 20 means but don't expect any good parallel
>>>>>>>>>>>>>> performance with less than at least 10,000 unknowns per process.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   Barry
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20
>>>>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>>>>>>>> Time (sec):           8.452e+00      1.00000   8.452e+00
>>>>>>>>>>>>>>> Objects:              1.470e+02      1.00000   1.470e+02
>>>>>>>>>>>>>>> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
>>>>>>>>>>>>>>> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
>>>>>>>>>>>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>>>>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>>>>>>>>>>>> MPI Reductions:       4.440e+02      1.00000
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20
>>>>>>>>>>>>>>> -log_summary
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>                         Max       Max/Min        Avg      Total
>>>>>>>>>>>>>>> Time (sec):           7.851e+00      1.00000   7.851e+00
>>>>>>>>>>>>>>> Objects:              2.000e+02      1.00000   2.000e+02
>>>>>>>>>>>>>>> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
>>>>>>>>>>>>>>> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
>>>>>>>>>>>>>>> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
>>>>>>>>>>>>>>> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
>>>>>>>>>>>>>>> MPI Reductions:       1.046e+03      1.00000
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> if there is something more you need, please feel free to let me
>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Matt,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for
>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a
>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors
>>>>>>>>>>>>>>>>> take
>>>>>>>>>>>>>>>>> approximately the same amount of time for computation of
>>>>>>>>>>>>>>>>> solution. But
>>>>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall
>>>>>>>>>>>>>>>>> let you
>>>>>>>>>>>>>>>>> know what that yields.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration
>>>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All
>>>>>>>>>>>>>>>>> ideas are
>>>>>>>>>>>>>>>>> much appreciated.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation.
>>>>>>>>>>>>>>>> On most
>>>>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus
>>>>>>>>>>>>>>>> using more
>>>>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you
>>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That
>>>>>>>>>>>>>>>> is why I
>>>>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many
>>>>>>>>>>>>>>>> processes were
>>>>>>>>>>>>>>>> run and breaks down the time.
>>>>>>>>>>>>>>>>    Matt
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley
>>>>>>>>>>>>>>>>> <knepley at gmail.com> wrote:
>>>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan
>>>>>>>>>>>>>>>>>> <vijay.m at gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI
>>>>>>>>>>>>>>>>>>> installation to
>>>>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>>>>>>>>>>>>>>>> eventhough the configure/make process went through without
>>>>>>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I
>>>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>>>> My configure options are
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/
>>>>>>>>>>>>>>>>>>> --download-mpich=1
>>>>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1
>>>>>>>>>>>>>>>>>>> --download-hypre=1
>>>>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1) For performance studies, make a build using
>>>>>>>>>>>>>>>>>> --with-debugging=0
>>>>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance
>>>>>>>>>>>>>>>>>>    Matt
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the
>>>>>>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing
>>>>>>>>>>>>>>>>>>> programs with
>>>>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking
>>>>>>>>>>>>>>>>>>> approximately the
>>>>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been
>>>>>>>>>>>>>>>>>>> testing
>>>>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a
>>>>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of
>>>>>>>>>>>>>>>>>>> unknowns.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If there is something you've witnessed before in this
>>>>>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me
>>>>>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their
>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>> lead.
>>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> <ex20.patch><ex20_np1.out><ex20_np2.out>
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their experiments
>>>>>>>> is infinitely more interesting than any results to which their experiments
>>>>>>>> lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>> 
>>>>>>> <ex20_np1.out><ex20_np2.out><ex20_np4.out>
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>> <basicversion_np1.out><basicversion_np2.out>
>> 
>> 


From jed at 59A2.org  Thu Feb  3 20:33:48 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 3 Feb 2011 23:33:48 -0300
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
	<AANLkTikA0B=Bxv+xCUj5q6wEzpmJ5qivEiuv0Evx+_jC@mail.gmail.com>
	<DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
Message-ID: <AANLkTin1SqXj1nmfJhy69FVBwK1i41oKxihaLtnndWm=@mail.gmail.com>

Try telling your MPI to run each process on different sockets, or on the
same socket with different caches. This is easy with Open MPI and with
MPICH+Hydra. You can simply use taskset for serial jobs.

On Feb 3, 2011 5:46 PM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:


  Based on these numbers (that is assuming these numbers are a correct
accounting of how much memory bandwidth you can get from the system*) you
essentially have a one processor machine that they sold to you as a 8
processor machine for sparse matrix computation. The one core run is using
almost all the memory bandwidth, adding more cores in the computation helps
very little because it is completely starved for memory bandwidth.

  Barry

* perhaps something in the OS is not configured correctly and thus not
allowing access to all the memory bandwidth, but this seems unlikely.


On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote:

> Barry,
>
> The outputs are attached. I do...
> <basicversion_np1.out><basicversion_np2.out>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110203/92ac0010/attachment.htm>

From bsmith at mcs.anl.gov  Thu Feb  3 20:54:58 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 3 Feb 2011 20:54:58 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <AANLkTin1SqXj1nmfJhy69FVBwK1i41oKxihaLtnndWm=@mail.gmail.com>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
	<AANLkTikA0B=Bxv+xCUj5q6wEzpmJ 5qivEiuv0Evx+_jC@mail.gmail.com>
	<DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
	<AANLkTin1SqXj1nmfJhy69FVBwK1i41oKxihaLtnndWm=@mail.gmail.com>
Message-ID: <D8710F8E-4D3B-4CB4-B99E-640CFBFE7F07@mcs.anl.gov>


On Feb 3, 2011, at 8:33 PM, Jed Brown wrote:

> Try telling your MPI to run each process on different sockets, or on the same socket with different caches. This is easy with Open MPI and with MPICH+Hydra. You can simply use taskset for serial jobs.

   We should add this options to the FAQ.html memory bandwidth question for everyone to easily look up.

    Barry

> 
> 
>> On Feb 3, 2011 5:46 PM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:
>> 
>> 
>>   Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth.
>> 
>>   Barry
>> 
>> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely.
>> 
>> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote:
>> 
>> > Barry,
>> > 
>> > The outputs are attached. I do...
>> 
>> > <basicversion_np1.out><basicversion_np2.out>
>> 
> 


From vijay.m at gmail.com  Thu Feb  3 21:09:45 2011
From: vijay.m at gmail.com (Vijay S. Mahadevan)
Date: Thu, 3 Feb 2011 21:09:45 -0600
Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core
In-Reply-To: <D8710F8E-4D3B-4CB4-B99E-640CFBFE7F07@mcs.anl.gov>
References: <AANLkTi=TMwzGiH2a_Gv6_7R7eVe4gokj0m71fFCYfjjH@mail.gmail.com>
	<AANLkTi=EAZ_xEd-YyfYGOuAHABKu1YbuPbdwkU2CW8+v@mail.gmail.com>
	<AANLkTi=98dEj0a-RPSD_ozOu-bL+ngP2BKz4e=c5TKcD@mail.gmail.com>
	<AANLkTinJXD6G1-OHXOmwWJe_6RsXt09T+s4_9BzBHs=u@mail.gmail.com>
	<AANLkTimaBSsk=LGKLqF-ahUUHOutusohzbX8e66C_aew@mail.gmail.com>
	<B94AC741-FF36-4C37-949E-75CE08F5E093@mcs.anl.gov>
	<AANLkTin40LM3-g9CmQkao+eAWrVho6Vnstbh2w0ALK+E@mail.gmail.com>
	<421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov>
	<AANLkTinkFVPfuVD_GxnRajXtjkuB2REc5a9RmsJdPcG5@mail.gmail.com>
	<4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov>
	<AANLkTikvQVnkLNHRV3TnBT61vDZDyOd8E5iq3BKiXK+A@mail.gmail.com>
	<AANLkTik_6n2=i3PX3-JqYH8t=n8hNbLmn0469+CECzGu@mail.gmail.com>
	<AANLkTimPTsRcdmJ7Gwkj3ZYWHv8Bfo3qLF6s0fCa+xw5@mail.gmail.com>
	<C57A533A-853C-4900-985D-E693C9368DB5@mcs.anl.gov>
	<AANLkTi=NAoJOPcoXg4WHUw6EkGh4EeHi1mns_w5EUgY1@mail.gmail.com>
	<C3D3FA87-35A0-4E6A-8299-5B5734412320@mcs.anl.gov>
	<DE1D007E-5507-4F6E-9D75-C451733BFD72@mcs.anl.gov>
	<AANLkTin1SqXj1nmfJhy69FVBwK1i41oKxihaLtnndWm=@mail.gmail.com>
	<D8710F8E-4D3B-4CB4-B99E-640CFBFE7F07@mcs.anl.gov>
Message-ID: <AANLkTimp3ZOaP-Qom_QgWcsjtg992c1TTLNjyd=xbvmc@mail.gmail.com>

I currently have it configured with mpich using --download-mpich. I
have not yet tried the mpich-device option that Satish suggested.

Jed, is there a configure option to include the Hydra manager during
MPI install ? I can also go the OpenMPI route and install the official
Ubuntu distribution to use with Petsc.

On a side-note, I installed some performance monitor tools in ubuntu
(http://manpages.ubuntu.com/manpages/lucid/man1/perf-stat.1.html) and
ran the BasicVersion benchmark with it. Here are the logs.

 Performance counter stats for
'/home/vijay/karma/contrib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1
./BasicVersion':

     853.205576  task-clock-msecs         #      0.996 CPUs
            107  context-switches         #      0.000 M/sec
              1  CPU-migrations           #      0.000 M/sec
          12453  page-faults              #      0.015 M/sec
     2981125976  cycles                   #   3494.030 M/sec
     2463421266  instructions             #      0.826 IPC
       33455540  cache-references         #     39.212 M/sec
       30304359  cache-misses             #     35.518 M/sec

    0.856807560  seconds time elapsed

 Performance counter stats for
'/home/vijay/karma/contrib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2
./BasicVersion':

    2904.477114  task-clock-msecs         #      1.982 CPUs
            533  context-switches         #      0.000 M/sec
              3  CPU-migrations           #      0.000 M/sec
          24728  page-faults              #      0.009 M/sec
     9904814141  cycles                   #   3410.188 M/sec
     4932342066  instructions             #      0.498 IPC
      108666258  cache-references         #     37.413 M/sec
      105503187  cache-misses             #     36.324 M/sec

    1.465376789  seconds time elapsed

There is clearly something fishy about this. Next I am going to
restart the machine and try the same without the gui to see if the
memory access improves without all the default background processes
running.

Vijay

On Thu, Feb 3, 2011 at 8:54 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Feb 3, 2011, at 8:33 PM, Jed Brown wrote:
>
>> Try telling your MPI to run each process on different sockets, or on the same socket with different caches. This is easy with Open MPI and with MPICH+Hydra. You can simply use taskset for serial jobs.
>
> ? We should add this options to the FAQ.html memory bandwidth question for everyone to easily look up.
>
> ? ?Barry
>
>>
>>
>>> On Feb 3, 2011 5:46 PM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:
>>>
>>>
>>> ? Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth.
>>>
>>> ? Barry
>>>
>>> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely.
>>>
>>> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote:
>>>
>>> > Barry,
>>> >
>>> > The outputs are attached. I do...
>>>
>>> > <basicversion_np1.out><basicversion_np2.out>
>>>
>>
>
>

From travis.fisher at nasa.gov  Fri Feb  4 08:02:08 2011
From: travis.fisher at nasa.gov (Travis C. Fisher)
Date: Fri, 4 Feb 2011 09:02:08 -0500
Subject: [petsc-users] Preconditioning in matrix free SNES
Message-ID: <4D4C06E0.5070601@nasa.gov>

I am trying to precondition a matrix free SNES solution. The application 
is a high order compressible Navier Stokes solver. My implementation is:

I use the -snes_mf_operator option and create a matrix free matrix via 
MatCreateSNESMF. The function SNESSetJacobian points to (FormJacobian) 
uses MatMFFDComputeJacobian to calculate the matrix free jacobian. I 
then use SNESSetFromOptions and extract the linear solver and 
preconditioner. The linear solver is GMRES. The matrix free jacobian 
solve with no preconditioner works pretty well for the smooth test 
problem I am currently working on, but in general I don't expect it to 
perform that well without preconditioning. I have attempted to perform 
the preconditioning in two ways:

1) Simply setting the preconditioner extracted from the SNES to PCLU. 
The preconditioner matrix I calculate in my FormJacobian routine is a 
frozen Jacobian matrix, so for the first linear solve, the 
preconditioned operator is essentially the identity operator. The code 
performs an LU decomposition, but the effect of the preconditioner is a 
larger linear residual, so clearly something is wrong.

2) I set the preconditioner from the SNES context to PCSHELL. I then 
make calls to PCShellSetSetup and PCShellSetApply to assign the routines 
for setting up and applying the PC. The routine for PCSetUp creates a 
new LU preconditioner and sets the operator the frozen jacobian matrix 
as described above via PCSetOperators. The PCApply just calls PCApply. 
Again the code performs the LU decomposition, but I get the same result 
as above.

I realize this may point to my preconditioner matrix being completely 
wrong, but it "looks" right when I check values. It would take a lot of 
effort for me to set up the coloring to have petsc calculate the 
jacobian via finite differences since I am using a high order stencil 
with full boundary closures.

My question is am I obviously doing something incorrectly? Am I somehow 
failing to direct the SNES context to apply the LU decomposition and not 
assume that I have given it a preconditioner matrix to simply perform a 
matrix multiply? I appreciate any direction you may be able to give me.

Thanks,

Travis Fisher

From jed at 59A2.org  Fri Feb  4 08:40:47 2011
From: jed at 59A2.org (Jed Brown)
Date: Fri, 4 Feb 2011 11:40:47 -0300
Subject: [petsc-users] Preconditioning in matrix free SNES
In-Reply-To: <4D4C06E0.5070601@nasa.gov>
References: <4D4C06E0.5070601@nasa.gov>
Message-ID: <AANLkTimV5h1DPM0JOH2nTN4X9wzRJA6=Lsy2c6_6R9jc@mail.gmail.com>

On Fri, Feb 4, 2011 at 11:02, Travis C. Fisher <travis.fisher at nasa.gov>wrote:

> I am trying to precondition a matrix free SNES solution. The application is
> a high order compressible Navier Stokes solver.
>

What sort of high-order methods?


> My implementation is:
>
> I use the -snes_mf_operator option and create a matrix free matrix via
> MatCreateSNESMF. The function SNESSetJacobian points to (FormJacobian) uses
> MatMFFDComputeJacobian to calculate the matrix free jacobian. I then use
> SNESSetFromOptions and extract the linear solver and preconditioner. The
> linear solver is GMRES. The matrix free jacobian solve with no
> preconditioner works pretty well for the smooth test problem I am currently
> working on, but in general I don't expect it to perform that well without
> preconditioning. I have attempted to perform the preconditioning in two
> ways:
>
> 1) Simply setting the preconditioner extracted from the SNES to PCLU. The
> preconditioner matrix I calculate in my FormJacobian routine is a frozen
> Jacobian matrix, so for the first linear solve, the preconditioned operator
> is essentially the identity operator. The code performs an LU decomposition,
> but the effect of the preconditioner is a larger linear residual, so clearly
> something is wrong.
>

The assembled matrix is likely incorrect, try using a tiny problem size and
-snes_type test. See also SNESSetLagJacobian
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/SNES/SNESSetLagJacobian.html
.

I realize this may point to my preconditioner matrix being completely wrong,
> but it "looks" right when I check values.
>

How are you creating it and how do you know what the values should "look"
like?


> It would take a lot of effort for me to set up the coloring to have petsc
> calculate the jacobian via finite differences since I am using a high order
> stencil with full boundary closures.
>

That's what -snes_type test is for.


> My question is am I obviously doing something incorrectly?
>

It sounds right to me, but run with -snes_view to see what's happening.


> Am I somehow failing to direct the SNES context to apply the LU
> decomposition and not assume that I have given it a preconditioner matrix to
> simply perform a matrix multiply?
>

This almost never makes sense and it would be very hard to do accidentally:
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/PC/PCMAT.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110204/6968ef1b/attachment.htm>

From travis.fisher at nasa.gov  Fri Feb  4 13:16:32 2011
From: travis.fisher at nasa.gov (Travis C. Fisher)
Date: Fri, 4 Feb 2011 14:16:32 -0500
Subject: [petsc-users] Preconditioning in matrix free SNES
In-Reply-To: <mailman.69.1296842431.13776.petsc-users@mcs.anl.gov>
References: <mailman.69.1296842431.13776.petsc-users@mcs.anl.gov>
Message-ID: <4D4C5090.3000209@nasa.gov>

Jed,

Thanks for the response. I think I have resolved this particular issue. 
My jacobian was incorrect. I use generalized coordinates and
I missed a scaling factor. I found this by generating the matrix finite 
difference coloring and calculating
the approximate jacobian, which was much easier than I originally 
anticipated. The jacobian I am trying to create is exact for convective 
terms based on the numerical methods (ESWENO finite difference schemes).

Travis


From jed at 59A2.org  Sat Feb  5 08:24:11 2011
From: jed at 59A2.org (Jed Brown)
Date: Sat, 5 Feb 2011 09:24:11 -0500
Subject: [petsc-users] Preconditioning in matrix free SNES
In-Reply-To: <AANLkTikYfKVn-QHwiENDtVgC9Gf9oc=09qZPGux0kA9R@mail.gmail.com>
References: <mailman.69.1296842431.13776.petsc-users@mcs.anl.gov>
	<4D4C5090.3000209@nasa.gov>
	<AANLkTikYfKVn-QHwiENDtVgC9Gf9oc=09qZPGux0kA9R@mail.gmail.com>
Message-ID: <AANLkTimxfeFr57B0hkkr+yJ1SB8_AXA=fzPjmD+JjjDy@mail.gmail.com>

I applaud your attention to detail if you worked out that Jacobian without
AD. There has been some success preconditioning high order methods with
nonlinear reconstruction using a matrix assembled with first order upwinding
(sparser and less messy to work out). I works be interested to hear if that
works well for you.

On Feb 4, 2011 8:17 PM, "Travis C. Fisher" <travis.fisher at nasa.gov> wrote:

Jed,

Thanks for the response. I think I have resolved this particular issue. My
jacobian was incorrect. I use generalized coordinates and
I missed a scaling factor. I found this by generating the matrix finite
difference coloring and calculating
the approximate jacobian, which was much easier than I originally
anticipated. The jacobian I am trying to create is exact for convective
terms based on the numerical methods (ESWENO finite difference schemes).

Travis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110205/38856f3e/attachment.htm>

From Robert.Ellis at geosoft.com  Sat Feb  5 15:54:30 2011
From: Robert.Ellis at geosoft.com (Robert Ellis)
Date: Sat, 5 Feb 2011 21:54:30 +0000
Subject: [petsc-users] PC Shell Left, Right, Symm?
Message-ID: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com>

Hello Experts,

When using a KSP Shell PreConditioner, is the LEFT, RIGHT, SYMMETRIC option applicable?

      call KSPGetPC(ksp,pc,ierr)
      call PCSetType(pc,PCSHELL,ierr)
      call KSPSetPreconditionerSide(ksp,PC_SYMMETRIC,ierr)
      call PCShellSetApply(pc,JacobiShellPCApply,ierr)

The JacobiShellPCApply PreConditioner improves convergence considerably on my KSPCG problem. However, I have found empirically that using PC_LEFT, PC_RIGHT, PC_SYMMETRIC seems to have no effect on the convergence of the solution. Can anyone explain this unusual situation?

Thanks in advance for any help.
Cheers,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110205/30c2bfda/attachment.htm>

From bsmith at mcs.anl.gov  Sat Feb  5 16:27:34 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 5 Feb 2011 16:27:34 -0600
Subject: [petsc-users] PC Shell Left, Right, Symm?
In-Reply-To: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com>
References: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com>
Message-ID: <23409A70-AB1B-407F-9770-6C0C6690CDED@mcs.anl.gov>


On Feb 5, 2011, at 3:54 PM, Robert Ellis wrote:

> Hello Experts,
>  
> When using a KSP Shell PreConditioner, is the LEFT, RIGHT, SYMMETRIC option applicable?
>  
>       call KSPGetPC(ksp,pc,ierr)
>       call PCSetType(pc,PCSHELL,ierr)
>       call KSPSetPreconditionerSide(ksp,PC_SYMMETRIC,ierr)
>       call PCShellSetApply(pc,JacobiShellPCApply,ierr)
>  
> The JacobiShellPCApply PreConditioner improves convergence considerably on my KSPCG problem. However, I have found empirically that using PC_LEFT, PC_RIGHT, PC_SYMMETRIC seems to have no effect on the convergence of the solution. Can anyone explain this unusual situation?

1) Based on the call call KSPSetPreconditionerSide(ksp,PC_SYMMETRIC,ierr) above you are correctly trying to set it to use symmetric preconditioning but it must be overwritten later because our CG doesn't have symmetric (or even right preconditioning implemented) so it would generate an error message when it tries to use it. If you run with -ksp_view it will show the side being used. Perhaps the ksp in the code fragment above is not the ksp being used in your linear solve?

2) When left or right preconditioning is being used by the Krylov method the PC object doesn't know or care, it just applies the preconditioner, in this case your JacobiShellPCApply() routine. When PC_SYMMETRIC is used the application of the preconditioner is "split" into two parts, the application of the left part and the right part, symbolically as   Bleft * A * Bright. This does not mean that the PCApply() is simply called twice once to apply the right part and once to apply the left part instead PCApplySymmetricRight() is called and then PCApplySymmetricLeft(). For example, if one wished to use symmetric ICC incomplete Cholesky preconditioning then these two operators are transposes of each other. Thus there should be two additional functions PCShellSetApplySymmetricRight() and PCShellSetApplySymmetricRight() allowing you to provide the two functions. PETSc doesn't currently have these but they could be trivially added.  However since our Krylov methods are not even implemented for symmetric application of the preconditioner it wouldn't help you.

If you use the GMRES method (just to check) you can switch the preconditioning to either side and you will see different convergence behavior.

In my experience using left or right preconditioning doesn't really matter much, but there are some people who swear that one is better than the other; and different people believe different things.

BTW: With PETSc's CG you can base your convergence test on either the preconditioned or nonpreconditioned residual norm this is controlled with KSPSetNormType()

   Barry


>  
> Thanks in advance for any help.
> Cheers,
> Rob


From jed at 59A2.org  Sat Feb  5 16:45:57 2011
From: jed at 59A2.org (Jed Brown)
Date: Sat, 5 Feb 2011 17:45:57 -0500
Subject: [petsc-users] PC Shell Left, Right, Symm?
In-Reply-To: <23409A70-AB1B-407F-9770-6C0C6690CDED@mcs.anl.gov>
References: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com>
	<23409A70-AB1B-407F-9770-6C0C6690CDED@mcs.anl.gov>
Message-ID: <AANLkTinjfVwJu0KE2aLkDP3xbdhLWjYJXSPYt5knfVq3@mail.gmail.com>

On Sat, Feb 5, 2011 at 17:27, Barry Smith <bsmith at mcs.anl.gov> wrote:

> In my experience using left or right preconditioning doesn't really matter
> much, but there are some people who swear that one is better than the other;
> and different people believe different things.


The important difference is whether the residuals are preconditioned or not,
it is rarely in the speed of convergence (in my experience as well, but see
note below). Left preconditioning causes the residuals to remove poor
scaling such as penalty boundary conditions before the first residual is
calculated. Right preconditioning shows you the unpreconditioned residuals.

If you do a convergence test with unpreconditioned residuals (right
preconditioning) and penalty boundary conditions, you might need a relative
tolerance of 1e-12 on the first solve since the initial iterate does not
satisfy boundary conditions, but then you might do a subsequent solve with
an initial iterate that satisfies the boundary conditions, in which case you
only need a relative tolerance of 1e-5 (or whatever tolerance you want in
the interior). This is awkward, so left preconditioning (working with
preconditioned residuals) makes sense with penalty boundary conditions.

If instead you do a convergence test with preconditioned residuals (left
preconditioning), but the preconditioner is singular (e.g. if you apply
BoomerAMG directly to a mixed-FEM discretization of incompressible
Navier-Stokes), it may erroneously appear to converge despite being nowhere
near converged. In this case, right preconditioning makes sense because
stagnation due to a singular preconditioner is clear.

My opinion is that you should choose whichever preconditioner side evaluates
residuals in the fom that is most meaningful for your discretization.

Note: if you solve the block system J = [A B; C D] using the exact
preconditioner P1 = [A B; 0 S] where S = D-C*inv(A)*B, then right
preconditioned GMRES converges in 2 iterations while left preconditioning is
not guaranteed to converge in any small number of iterations (though it may
still in practice). If instead you use P2=[A 0;C S], then
left-preconditioned GMRES converges in 2 while right-preconditioning does
not guarantee a low iteration count.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110205/8f9f19a4/attachment.htm>

From kenway at utias.utoronto.ca  Sat Feb  5 18:19:50 2011
From: kenway at utias.utoronto.ca (Gaetan Kenway)
Date: Sat, 5 Feb 2011 19:19:50 -0500
Subject: [petsc-users] Fortran external procedure in ctx()
Message-ID: <AANLkTim2MXGma+bniNg4nA0RdmE70LGz43=Cd+Y-pQo5@mail.gmail.com>

Hello

I'm wondering if it is possible to put an external procedure reference in a
ctx() in fortran.

 I'm in the process of writing a Newton--Krylov solver for an
aero-structural system. My two different codes are wrapped with python and
so each code is called through python for residual and preconditioning
operations. Nominally this would be a good use of petsc4py but it doesn't
allow for PCShell so it is no use to me.

I then wrote up the solver in Fortran and attempted to use callbacks to
python for computing the required information. Using f2py, I can pass my two
call back functions cb1 and cb2 fortran. A schematic of the code is below:

subroutine solver(cb1, cb2)

   ! cb1 and cb2 are python callbacks set using f2py
   external cb1,cb2
   petscFortranAddress ctx(2)

   ! I would like to do the following, but this doesn't compile
   ! ctx(1) = cb1
   ! ctx(2) = cb2

  call SNESCreate(comm,snes,ierr)
  call SNESSetFunction(snes,resVec,FormFunction,ctx,ierr)

  call KSPGetPC(ksp,pc,ierr)
  call PCSetType(pc,PCSHELL,ierr)

  call PCShellSetContext(pc,ctx,ierr)
  call PCShellSetApply(pc,applyaspc_fortran,ierr)

end subroutine solver

subroutine applyaspc_fortran(pc,inputVec,outputVec,ierr)

  PC      pc
  Vec     inputVec,outputVec
  PetscFortranAddress ctx(2)
  external func

  call PCShellGetContext(pc,ctx,ierr)
  func = ctx(2)

  call VecGetArrayF90(inputVec,states_in,ierr)
  call VecGetArrayF90(outputVec,states_out,ierr)

  ! Call the callback to python
  call func(states_in,states_out,shape(states_in))

  call VecRestoreArrayF90(inputVec,states_in,ierr)
  call VecRestoreArrayF90(outputVec,states_out,ierr)
end subroutine applyaspc_fortran


 In general, in Fortran, is it possible to put an external function
reference in a module such that I wouldn't have to try to pass it through
the application ctx? I realize this may be impossible to do in Fortran.
Would such a procedure be possible in C? I'm only using Fortran since I'm
much more familiar with it then with C.

Sorry there isn't much to go on, but any suggestions would be greatly
appreciated.

Gaetan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110205/2b8cd8c4/attachment.htm>

From bsmith at mcs.anl.gov  Sat Feb  5 19:28:52 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 5 Feb 2011 19:28:52 -0600
Subject: [petsc-users] Fortran external procedure in ctx()
In-Reply-To: <AANLkTim2MXGma+bniNg4nA0RdmE70LGz43=Cd+Y-pQo5@mail.gmail.com>
References: <AANLkTim2MXGma+bniNg4nA0RdmE70LGz43=Cd+Y-pQo5@mail.gmail.com>
Message-ID: <6E6044B5-259F-4986-ACC2-600EF11EF529@mcs.anl.gov>


On Feb 5, 2011, at 6:19 PM, Gaetan Kenway wrote:

> Hello 
> 
> I'm wondering if it is possible to put an external procedure reference in a ctx() in fortran.
> 
>  I'm in the process of writing a Newton--Krylov solver for an aero-structural system. My two different codes are wrapped with python and so each code is called through python for residual and preconditioning operations. Nominally this would be a good use of petsc4py but it doesn't allow for PCShell so it is no use to me. 

  Before making live hard by futzing around with Fortran or C lets make sure you really cannot do this in Python. What about using PCPYTHON? My guess is that this allows building your PC from pieces just like PCSHELL. 

   Barry

> 
> I then wrote up the solver in Fortran and attempted to use callbacks to python for computing the required information. Using f2py, I can pass my two call back functions cb1 and cb2 fortran. A schematic of the code is below:
> 
> subroutine solver(cb1, cb2)
> 
>    ! cb1 and cb2 are python callbacks set using f2py
>    external cb1,cb2
>    petscFortranAddress ctx(2)
> 
>    ! I would like to do the following, but this doesn't compile
>    ! ctx(1) = cb1
>    ! ctx(2) = cb2
> 
>   call SNESCreate(comm,snes,ierr)
>   call SNESSetFunction(snes,resVec,FormFunction,ctx,ierr)
> 
>   call KSPGetPC(ksp,pc,ierr)
>   call PCSetType(pc,PCSHELL,ierr)
> 
>   call PCShellSetContext(pc,ctx,ierr)
>   call PCShellSetApply(pc,applyaspc_fortran,ierr)
> 
> end subroutine solver
> 
> subroutine applyaspc_fortran(pc,inputVec,outputVec,ierr)
> 
>   PC      pc
>   Vec     inputVec,outputVec
>   PetscFortranAddress ctx(2)
>   external func
>   
>   call PCShellGetContext(pc,ctx,ierr)
>   func = ctx(2)
> 
>   call VecGetArrayF90(inputVec,states_in,ierr)
>   call VecGetArrayF90(outputVec,states_out,ierr)
> 
>   ! Call the callback to python
>   call func(states_in,states_out,shape(states_in))
> 
>   call VecRestoreArrayF90(inputVec,states_in,ierr)
>   call VecRestoreArrayF90(outputVec,states_out,ierr)
> end subroutine applyaspc_fortran
> 
> 
>  In general, in Fortran, is it possible to put an external function reference in a module such that I wouldn't have to try to pass it through the application ctx? I realize this may be impossible to do in Fortran. Would such a procedure be possible in C? I'm only using Fortran since I'm much more familiar with it then with C.
> 
> Sorry there isn't much to go on, but any suggestions would be greatly appreciated.
> 
> Gaetan


From gaurish108 at gmail.com  Sun Feb  6 02:11:35 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Sun, 6 Feb 2011 03:11:35 -0500
Subject: [petsc-users] BLAS library for PETSc
Message-ID: <AANLkTimXB47Uxd1e=CjJNqPSnBHFfscEtKdDhr-EOmp3@mail.gmail.com>

How good is the BLAS library that PETSc downloads with the option
"--download-f-blas-lapack=1
" during the installation step ?

Is it recommended to use this BLAS library with PETSc or libraries such as
ATLAS or GOTO?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110206/28be456f/attachment-0001.htm>

From jed at 59A2.org  Sun Feb  6 02:21:22 2011
From: jed at 59A2.org (Jed Brown)
Date: Sun, 6 Feb 2011 03:21:22 -0500
Subject: [petsc-users] BLAS library for PETSc
In-Reply-To: <AANLkTimXB47Uxd1e=CjJNqPSnBHFfscEtKdDhr-EOmp3@mail.gmail.com>
References: <AANLkTimXB47Uxd1e=CjJNqPSnBHFfscEtKdDhr-EOmp3@mail.gmail.com>
Message-ID: <AANLkTi=Js4YsQLx9t0AJEX5atGRr5xfQhWTKoDLchscj@mail.gmail.com>

The reference BLAS is "bad", but unless you are doing dense linear algebra
or using a third party solver that uses BLAS level 3 internally (MUMPS), a
tuned implementation will make little difference because relatively little
time will be spent in BLAS, and those operations are memory bound (and can
only be improved a modest amount by trickery).

On Feb 6, 2011 9:11 AM, "Gaurish Telang" <gaurish108 at gmail.com> wrote:

How good is the BLAS library that PETSc downloads with the option
"--download-f-blas-lapack=1
" during the installation step ?

Is it recommended to use this BLAS library with PETSc or libraries such as
ATLAS or GOTO?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110206/a415f247/attachment.htm>

From pengxwang at hotmail.com  Sun Feb  6 17:00:14 2011
From: pengxwang at hotmail.com (Peter Wang)
Date: Sun, 6 Feb 2011 17:00:14 -0600
Subject: [petsc-users] questions about the multigrid framework
Message-ID: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>


Hello, I have some concerns about the multigrid framework in PETSc.
 
We are trying to solve a two dimensional problem with a large variety in length scales.  The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. 
 
As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids.  However, the error of the numerical solution increases when the number of the grid is too large.  In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve.  It is found that the extreme small grid size might lead to large variation to the exact solution.  For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000.  However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution.  The solver I used is a KSP solver in PETSc, which is set by calling :
KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr).  Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
 
I did some research work on the website and found the slides by Barry on
 http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
It seems that the multigrid framework in PETSc is a possible approach to our problem.  We are thinking to turn to the multigrid framework in PETSc to solve the problem.  However, before we dig into it, there are some issues confusing us.  It would be great if we can get any suggestion from you:
1  Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? 
 
2  The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)).  Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
 
3  How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
  		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110206/cc455ed1/attachment.htm>

From bsmith at mcs.anl.gov  Sun Feb  6 21:30:56 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 6 Feb 2011 21:30:56 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>
Message-ID: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>


On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:

> Hello, I have some concerns about the multigrid framework in PETSc.
>  
> We are trying to solve a two dimensional problem with a large variety in length scales.  The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain.
>  
> As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids.  However, the error of the numerical solution increases when the number of the grid is too large.  In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve.  It is found that the extreme small grid size might lead to large variation to the exact solution.  For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000.  However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution.  

  Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh.

   Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol <tol> for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12

  Barry

  We'll deal with the multigrid questions after we've resolved the more basic issues.


> The solver I used is a KSP solver in PETSc, which is set by calling :
> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr).  Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
>  
> I did some research work on the website and found the slides by Barry on
>  http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> It seems that the multigrid framework in PETSc is a possible approach to our problem.  We are thinking to turn to the multigrid framework in PETSc to solve the problem.  However, before we dig into it, there are some issues confusing us.  It would be great if we can get any suggestion from you:
> 1  Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem?
>  
> 2  The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)).  Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
>  
> 3  How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
>  


From ecoon at lanl.gov  Mon Feb  7 10:26:21 2011
From: ecoon at lanl.gov (Ethan Coon)
Date: Mon, 07 Feb 2011 09:26:21 -0700
Subject: [petsc-users] Fortran external procedure in ctx()
In-Reply-To: <AANLkTim2MXGma+bniNg4nA0RdmE70LGz43=Cd+Y-pQo5@mail.gmail.com>
References: <AANLkTim2MXGma+bniNg4nA0RdmE70LGz43=Cd+Y-pQo5@mail.gmail.com>
Message-ID: <1297095981.2263.9.camel@echo.lanl.gov>

On Sat, 2011-02-05 at 19:19 -0500, Gaetan Kenway wrote:
> Hello 
> 
> I'm wondering if it is possible to put an external procedure reference
> in a ctx() in fortran.
> 
>  I'm in the process of writing a Newton--Krylov solver for an
> aero-structural system. My two different codes are wrapped with python
> and so each code is called through python for residual and
> preconditioning operations. Nominally this would be a good use of
> petsc4py but it doesn't allow for PCShell so it is no use to me. 

Maybe it's not clear to me what you're trying to do, but I think that
petsc4py can make PCShells just fine.  

I've attached an example in pure python which uses petsc4py to generate
both Mat and PC shells to solve the saddle point problem that arises
from using Lagrange Multipliers to apply boundary conditions to
Laplace's equation.

Both the Schur complement and the full, block matrix are stored as Mat
shells, and a PC shell is used to store the PC of the full matrix
[[ A^-1, 0], [0, S]] and to do the inner solve required within the
MatShell for S.

Ethan


> 
> I then wrote up the solver in Fortran and attempted to use callbacks
> to python for computing the required information. Using f2py, I can
> pass my two call back functions cb1 and cb2 fortran. A schematic of
> the code is below:
> 
> subroutine solver(cb1, cb2)
> 
>    ! cb1 and cb2 are python callbacks set using f2py
>    external cb1,cb2
>    petscFortranAddress ctx(2)
> 
>    ! I would like to do the following, but this doesn't compile
>    ! ctx(1) = cb1
>    ! ctx(2) = cb2
> 
>   call SNESCreate(comm,snes,ierr)
>   call SNESSetFunction(snes,resVec,FormFunction,ctx,ierr)
> 
>   call KSPGetPC(ksp,pc,ierr)
>   call PCSetType(pc,PCSHELL,ierr)
> 
>   call PCShellSetContext(pc,ctx,ierr)
>   call PCShellSetApply(pc,applyaspc_fortran,ierr)
> 
> end subroutine solver
> 
> subroutine applyaspc_fortran(pc,inputVec,outputVec,ierr)
> 
>   PC      pc
>   Vec     inputVec,outputVec
>   PetscFortranAddress ctx(2)
>   external func
>   
>   call PCShellGetContext(pc,ctx,ierr)
>   func = ctx(2)
> 
>   call VecGetArrayF90(inputVec,states_in,ierr)
>   call VecGetArrayF90(outputVec,states_out,ierr)
> 
>   ! Call the callback to python
>   call func(states_in,states_out,shape(states_in))
> 
>   call VecRestoreArrayF90(inputVec,states_in,ierr)
>   call VecRestoreArrayF90(outputVec,states_out,ierr)
> end subroutine applyaspc_fortran
> 
> 
>  In general, in Fortran, is it possible to put an external function
> reference in a module such that I wouldn't have to try to pass it
> through the application ctx? I realize this may be impossible to do in
> Fortran. Would such a procedure be possible in C? I'm only using
> Fortran since I'm much more familiar with it then with C.
> 
> Sorry there isn't much to go on, but any suggestions would be greatly
> appreciated.
> 
> Gaetan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lm_solver.py
Type: text/x-python
Size: 8826 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110207/0a0d613a/attachment.py>

From pengxwang at hotmail.com  Mon Feb  7 10:49:28 2011
From: pengxwang at hotmail.com (Peter Wang)
Date: Mon, 7 Feb 2011 10:49:28 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>,
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
Message-ID: <BAY128-W4D39C461AA9B0CAD40D18B4EB0@phx.gbl>


Thanks, Barry,
 
   I didn't  run with  with -ksp_monitor_true_residual -ksp_converged_reason.  My own code was built based on the petsc-current/src/ksp/ksp/examples/tutorials/ex2f.F.  Since the line of 248 which with KSPSetTolerances is commented out, it seems I didn't set the tolerance in my code.
 
  If I need to run with option -ksp_monitor_true_residual -ksp_converged_reason , I should add some lines like: 
    call PetscOptionsHasName()
    call KSPGetConvergedReason()
 Am I right?
 
In order to make the problem clear, I just attached the discription of my problem. Thanks a lot for any help from you.
 
Following is the portion of the code with KSP solver.
!============
      call KSPCreate(MPI_COMM_WORLD,ksp,ierr)
      call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
      call KSPSolve(ksp,b,x,ierr)
      call KSPGetIterationNumber(ksp,its,ierr)
        if (myid .eq. 0) then
            if (norm .gt. 1.e-12) then
            write(6,100) norm,its
            else
            write(6,110) its
            endif
        endif
    100 format('Norm of error ',e10.4,' iterations ',i5)
    110 format('Norm of error < 1.e-12,iterations ',i5)

!=============
 
> From: bsmith at mcs.anl.gov
> Date: Sun, 6 Feb 2011 21:30:56 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] questions about the multigrid framework
> 
> 
> On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> 
> > Hello, I have some concerns about the multigrid framework in PETSc.
> > 
> > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain.
> > 
> > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. 
> 
> Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh.
> 
> Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol <tol> for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12
> 
> Barry
> 
> We'll deal with the multigrid questions after we've resolved the more basic issues.
> 
> 
> > The solver I used is a KSP solver in PETSc, which is set by calling :
> > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
> > 
> > I did some research work on the website and found the slides by Barry on
> > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you:
> > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem?
> > 
> > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
> > 
> > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
> > 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110207/19f08dbb/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Problem_discription.pdf
Type: application/pdf
Size: 95775 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110207/19f08dbb/attachment-0001.pdf>

From jed at 59A2.org  Mon Feb  7 10:53:36 2011
From: jed at 59A2.org (Jed Brown)
Date: Mon, 7 Feb 2011 17:53:36 +0100
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <BAY128-W4D39C461AA9B0CAD40D18B4EB0@phx.gbl>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
	<BAY128-W4D39C461AA9B0CAD40D18B4EB0@phx.gbl>
Message-ID: <AANLkTimneTvJZGb4U56R0CVO7jALU9i9T779u0bAS-xR@mail.gmail.com>

On Mon, Feb 7, 2011 at 17:49, Peter Wang <pengxwang at hotmail.com> wrote:

>   If I need to run with option -ksp_monitor_true_residual
> -ksp_converged_reason , I should add some lines like:
>     call PetscOptionsHasName<http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Sys/PetscOptionsHasName.html#PetscOptionsHasName>
> ()
>     call KSPGetConvergedReason()
>  Am I right?
>

This is insane, just be sure to call KSPSetFromOptions() and then all the
options will work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110207/d8483dbf/attachment.htm>

From u.tabak at tudelft.nl  Tue Feb  8 07:28:06 2011
From: u.tabak at tudelft.nl (Umut Tabak)
Date: Tue, 08 Feb 2011 14:28:06 +0100
Subject: [petsc-users] Operator matrix as a Matrix-free matrix
Message-ID: <4D5144E6.2090705@tudelft.nl>

Dear all,

I would like to create an operator matrix like

(I - 0.5 C^{-1}(C+kD))

for linear iterative solvers, where k is a given scalar and C and D are 
matrices from a FE  discretization.

Moreover, the second part of the operator matrix can be constructed 
efficiently by using a matrix-vector product and a forward-backward 
substitution since I have the factorization of C which is a symmetric 
matrix.

I guess I should use matrix free operations and create the two matrices 
as shell matrices such as M1 (for I) and M2 (for the rest of above) and 
sum them, is this the most efficient way to do this?

Best wishes,
Umut

-- 
  - Hope is a good thing, maybe the best of things
    and no good thing ever dies...
  The Shawshank Redemption, replique of Tim Robbins


From bsmith at mcs.anl.gov  Tue Feb  8 07:38:59 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 8 Feb 2011 07:38:59 -0600
Subject: [petsc-users] Operator matrix as a Matrix-free matrix
In-Reply-To: <4D5144E6.2090705@tudelft.nl>
References: <4D5144E6.2090705@tudelft.nl>
Message-ID: <856AAEBD-892D-467A-9783-FE97757AAE8E@mcs.anl.gov>


On Feb 8, 2011, at 7:28 AM, Umut Tabak wrote:

> Dear all,
> 
> I would like to create an operator matrix like
> 
> (I - 0.5 C^{-1}(C+kD))
> 
> for linear iterative solvers, where k is a given scalar and C and D are matrices from a FE  discretization.
> 
> Moreover, the second part of the operator matrix can be constructed efficiently by using a matrix-vector product and a forward-backward substitution since I have the factorization of C which is a symmetric matrix.
> 
> I guess I should use matrix free operations and create the two matrices as shell matrices such as M1 (for I) and M2 (for the rest of above) and sum them, is this the most efficient way to do this?
> 

   I would make a single MATSHELL, inside it I would store the k, the matrix C and the matrix D, in addition I would store in it a KSP object where I have called KSPSetOperators() with the C matrix. Then the PCApply for the shell matrix could be .5*( I - k* kspsolve(C)*D)  if I have my math correct. No reason that I can see for having more than one shell matrix.

   Barry

> Best wishes,
> Umut
> 
> -- 
> - Hope is a good thing, maybe the best of things
>   and no good thing ever dies...
> The Shawshank Redemption, replique of Tim Robbins
> 


From klaus.zimmermann at physik.uni-freiburg.de  Tue Feb  8 10:16:10 2011
From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann)
Date: Tue, 08 Feb 2011 17:16:10 +0100
Subject: [petsc-users] Howto evaluate function on grid
Message-ID: <4D516C4A.6040207@physik.uni-freiburg.de>

Dear all,

I want to evaluate a function on a grid. Right now I have some custom 
code (see the end of this message).
I am now thinking of rewriting this using more PETSC facilities because 
in the future we want to extend the
program to higher dimensions and also unstructured grids. As I 
understand it now there are several
candidates in PETSC for doing this:

1) PF
2) DALocalFunction
3) FIAT (?)

Could you please advise on what should be used?
An additional problem is that several distinct functions should be 
evaluated at the same time due to the
due to the reuse of intermediate results.

Any help is appreciated!
Thanks in advance,
Klaus

------------------8<------------------8<------------------8<------------------8<------------------8<--------------

PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) {
   PetscErrorCode ierr;
   PetscInt i, j, cxs, cxm, cys, cym;
   PetscScalar **lpsi;
   PetscScalar **lGradPsi_x, **lGradPsi_y;
   Vec gc;
   DACoor2d **coors;
   DA cda;
   ierr = DAGetCoordinateDA(user->zGrid, &cda);CHKERRQ(ierr);
   ierr = DAGetCoordinates(user->zGrid, &gc);CHKERRQ(ierr);
   ierr = DAVecGetArray(cda, gc, &coors);CHKERRQ(ierr);
   ierr = DAGetCorners(cda, &cxs, &cys, PETSC_NULL, &cxm, &cym, 
PETSC_NULL);CHKERRQ(ierr);

   ierr = DAVecGetArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr);
   ierr = DAVecGetArray(user->zGrid, user->gradPsi_x, 
&lGradPsi_x);CHKERRQ(ierr);
   ierr = DAVecGetArray(user->zGrid, user->gradPsi_y, 
&lGradPsi_y);CHKERRQ(ierr);
   for(i=cys; i<cys+cym; i++) {
     for(j=cxs; j<cxs+cxm; j++) {
       PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y),
     y = PetscRealPart(coors[i][j].y);
       if(x>=0) {
     ierr = EvaluatePsiAndGradPsi(user, x, y,
&(lpsi[i][j]),
&(lGradPsi_x[i][j]),
&(lGradPsi_y[i][j]));CHKERRQ(ierr);
       }
     }
   }
   ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x, 
&lGradPsi_x);CHKERRQ(ierr);
   ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y, 
&lGradPsi_y);CHKERRQ(ierr);
   ierr = DAVecRestoreArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr);
   ierr = DAVecRestoreArray(cda, gc, &coors);CHKERRQ(ierr);
   ierr = VecDestroy(gc);CHKERRQ(ierr);
   ierr = DADestroy(cda);CHKERRQ(ierr);
   return 0;
}

From bsmith at mcs.anl.gov  Tue Feb  8 10:29:48 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 8 Feb 2011 10:29:48 -0600
Subject: [petsc-users] Howto evaluate function on grid
In-Reply-To: <4D516C4A.6040207@physik.uni-freiburg.de>
References: <4D516C4A.6040207@physik.uni-freiburg.de>
Message-ID: <B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>


On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote:

> Dear all,
> 
> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message).
> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the
> program to higher dimensions and also unstructured grids.

   There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them.

> As I understand it now there are several
> candidates in PETSC for doing this:
> 
> 1) PF
> 2) DALocalFunction

   Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES()


> 3) FIAT (?)
> 
> Could you please advise on what should be used?
> An additional problem is that several distinct functions should be evaluated at the same time due to the
> due to the reuse of intermediate results.
    
   Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions.

   Barry

> 
> Any help is appreciated!
> Thanks in advance,
> Klaus
> 
> ------------------8<------------------8<------------------8<------------------8<------------------8<--------------
> 
> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) {
>  PetscErrorCode ierr;
>  PetscInt i, j, cxs, cxm, cys, cym;
>  PetscScalar **lpsi;
>  PetscScalar **lGradPsi_x, **lGradPsi_y;
>  Vec gc;
>  DACoor2d **coors;
>  DA cda;
>  ierr = DAGetCoordinateDA(user->zGrid, &cda);CHKERRQ(ierr);
>  ierr = DAGetCoordinates(user->zGrid, &gc);CHKERRQ(ierr);
>  ierr = DAVecGetArray(cda, gc, &coors);CHKERRQ(ierr);
>  ierr = DAGetCorners(cda, &cxs, &cys, PETSC_NULL, &cxm, &cym, PETSC_NULL);CHKERRQ(ierr);
> 
>  ierr = DAVecGetArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr);
>  ierr = DAVecGetArray(user->zGrid, user->gradPsi_x, &lGradPsi_x);CHKERRQ(ierr);
>  ierr = DAVecGetArray(user->zGrid, user->gradPsi_y, &lGradPsi_y);CHKERRQ(ierr);
>  for(i=cys; i<cys+cym; i++) {
>    for(j=cxs; j<cxs+cxm; j++) {
>      PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y),
>    y = PetscRealPart(coors[i][j].y);
>      if(x>=0) {
>    ierr = EvaluatePsiAndGradPsi(user, x, y,
> &(lpsi[i][j]),
> &(lGradPsi_x[i][j]),
> &(lGradPsi_y[i][j]));CHKERRQ(ierr);
>      }
>    }
>  }
>  ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x, &lGradPsi_x);CHKERRQ(ierr);
>  ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y, &lGradPsi_y);CHKERRQ(ierr);
>  ierr = DAVecRestoreArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr);
>  ierr = DAVecRestoreArray(cda, gc, &coors);CHKERRQ(ierr);
>  ierr = VecDestroy(gc);CHKERRQ(ierr);
>  ierr = DADestroy(cda);CHKERRQ(ierr);
>  return 0;
> }


From klaus.zimmermann at physik.uni-freiburg.de  Tue Feb  8 10:52:01 2011
From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann)
Date: Tue, 08 Feb 2011 17:52:01 +0100
Subject: [petsc-users] Howto evaluate function on grid
In-Reply-To: <B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>
References: <4D516C4A.6040207@physik.uni-freiburg.de>
	<B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>
Message-ID: <4D5174B1.8070601@physik.uni-freiburg.de>

Hi Barry,

thanks for your response. I guess my code excerpt wasn't very good, nor 
was my description. My apologies.

The evaluation in more details goes like this:
Depending on the coordinates x and y (and only on them) I calculate 4 
vectors S1,...,S4.
I have a constant matrix C in the appctx. The three quantities I am 
interested in are then:

1) u1 = (x+y)*VecTDot(S1, MatMult(C, S2))
2) u2 = u1/(x+y) + (x+y)*VecTDot(S3, MatMult(C, S2))
3) u3 = u1/(x+y) + (x+y)*VecTDot(S1, MatMult(C, S4))

With this (hopefully better) description let me answer to your points 
below individually.
On 02/08/2011 05:29 PM, Barry Smith wrote:
> On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote:
>
>> Dear all,
>>
>> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message).
>> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the
>> program to higher dimensions and also unstructured grids.
>     There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them.
This is why I hoped to reuse code for the unstructured version: As long 
as I can call a method with coordinates and context I am fine.
I don't really need any solving. Do I still need different FormFunctions?

>> As I understand it now there are several
>> candidates in PETSC for doing this:
>>
>> 1) PF
>> 2) DALocalFunction
>     Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES()
I agree. This is mostly because I didn't understand the concepts so well 
at the time I wrote this code and one of the reasons why I would like to 
refactor.
In my case there should in principle be three output vectors. All the 
facilities I have seen in petsc only deal with a single output vector. 
Is this correct?
Of course there is an obvious mapping, but I would prefer to keep the 
vectors apart because that way it is easier to deal with the parallel 
layout.

>> 3) FIAT (?)
>>
>> Could you please advise on what should be used?
>> An additional problem is that several distinct functions should be evaluated at the same time due to the
>> due to the reuse of intermediate results.
>
>     Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions.
I am not really using any solving at the moment. Please let me know if 
you need more detail.

Thanks again,
Klaus

>     Barry
>
>> Any help is appreciated!
>> Thanks in advance,
>> Klaus
>>
>> ------------------8<------------------8<------------------8<------------------8<------------------8<--------------
>>
>> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) {
>>   PetscErrorCode ierr;
>>   PetscInt i, j, cxs, cxm, cys, cym;
>>   PetscScalar **lpsi;
>>   PetscScalar **lGradPsi_x, **lGradPsi_y;
>>   Vec gc;
>>   DACoor2d **coors;
>>   DA cda;
>>   ierr = DAGetCoordinateDA(user->zGrid,&cda);CHKERRQ(ierr);
>>   ierr = DAGetCoordinates(user->zGrid,&gc);CHKERRQ(ierr);
>>   ierr = DAVecGetArray(cda, gc,&coors);CHKERRQ(ierr);
>>   ierr = DAGetCorners(cda,&cxs,&cys, PETSC_NULL,&cxm,&cym, PETSC_NULL);CHKERRQ(ierr);
>>
>>   ierr = DAVecGetArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr);
>>   ierr = DAVecGetArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr);
>>   ierr = DAVecGetArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr);
>>   for(i=cys; i<cys+cym; i++) {
>>     for(j=cxs; j<cxs+cxm; j++) {
>>       PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y),
>>     y = PetscRealPart(coors[i][j].y);
>>       if(x>=0) {
>>     ierr = EvaluatePsiAndGradPsi(user, x, y,
>> &(lpsi[i][j]),
>> &(lGradPsi_x[i][j]),
>> &(lGradPsi_y[i][j]));CHKERRQ(ierr);
>>       }
>>     }
>>   }
>>   ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr);
>>   ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr);
>>   ierr = DAVecRestoreArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr);
>>   ierr = DAVecRestoreArray(cda, gc,&coors);CHKERRQ(ierr);
>>   ierr = VecDestroy(gc);CHKERRQ(ierr);
>>   ierr = DADestroy(cda);CHKERRQ(ierr);
>>   return 0;
>> }


From bsmith at mcs.anl.gov  Tue Feb  8 11:00:41 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 8 Feb 2011 11:00:41 -0600
Subject: [petsc-users] Howto evaluate function on grid
In-Reply-To: <4D5174B1.8070601@physik.uni-freiburg.de>
References: <4D516C4A.6040207@physik.uni-freiburg.de>
	<B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>
	<4D5174B1.8070601@physik.uni-freiburg.de>
Message-ID: <8B8D3CD3-9970-4838-8C33-0FABE332A8C8@mcs.anl.gov>


On Feb 8, 2011, at 10:52 AM, Klaus Zimmermann wrote:

> Hi Barry,
> 
> thanks for your response. I guess my code excerpt wasn't very good, nor was my description. My apologies.
> 
> The evaluation in more details goes like this:
> Depending on the coordinates x and y (and only on them) I calculate 4 vectors S1,...,S4.
> I have a constant matrix C in the appctx. The three quantities I am interested in are then:
> 
> 1) u1 = (x+y)*VecTDot(S1, MatMult(C, S2))
> 2) u2 = u1/(x+y) + (x+y)*VecTDot(S3, MatMult(C, S2))
> 3) u3 = u1/(x+y) + (x+y)*VecTDot(S1, MatMult(C, S4))
> 
   How big are S?  

> With this (hopefully better) description let me answer to your points below individually.
> On 02/08/2011 05:29 PM, Barry Smith wrote:
>> On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote:
>> 
>>> Dear all,
>>> 
>>> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message).
>>> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the
>>> program to higher dimensions and also unstructured grids.
>>    There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them.
> This is why I hoped to reuse code for the unstructured version: As long as I can call a method with coordinates and context I am fine.
> I don't really need any solving. Do I still need different FormFunctions?

  NO

> 
>>> As I understand it now there are several
>>> candidates in PETSC for doing this:
>>> 
>>> 1) PF
>>> 2) DALocalFunction
>>    Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES()
> I agree. This is mostly because I didn't understand the concepts so well at the time I wrote this code and one of the reasons why I would like to refactor.
> In my case there should in principle be three output vectors. All the facilities I have seen in petsc only deal with a single output vector. Is this correct?
> Of course there is an obvious mapping, but I would prefer to keep the vectors apart because that way it is easier to deal with the parallel layout.

  You can keep them separate. You can have as many vector inputs and outputs you want (it is only the SNES solvers that need exactly one input and one output). Sometimes storing several vectors "interlaced" gives 
better performance since it uses the cache's better but that is only an optimization.

  If you separate the "iterater" part of the code from the "function" part then you can have a different iterator for structured and unstructured grid but reuse the "function" part. 
> 
>>> 3) FIAT (?)
>>> 
>>> Could you please advise on what should be used?
>>> An additional problem is that several distinct functions should be evaluated at the same time due to the
>>> due to the reuse of intermediate results.
>> 
>>    Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions.
> I am not really using any solving at the moment. Please let me know if you need more detail.
> 
> Thanks again,
> Klaus
> 
>>    Barry
>> 
>>> Any help is appreciated!
>>> Thanks in advance,
>>> Klaus
>>> 
>>> ------------------8<------------------8<------------------8<------------------8<------------------8<--------------
>>> 
>>> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) {
>>>  PetscErrorCode ierr;
>>>  PetscInt i, j, cxs, cxm, cys, cym;
>>>  PetscScalar **lpsi;
>>>  PetscScalar **lGradPsi_x, **lGradPsi_y;
>>>  Vec gc;
>>>  DACoor2d **coors;
>>>  DA cda;
>>>  ierr = DAGetCoordinateDA(user->zGrid,&cda);CHKERRQ(ierr);
>>>  ierr = DAGetCoordinates(user->zGrid,&gc);CHKERRQ(ierr);
>>>  ierr = DAVecGetArray(cda, gc,&coors);CHKERRQ(ierr);
>>>  ierr = DAGetCorners(cda,&cxs,&cys, PETSC_NULL,&cxm,&cym, PETSC_NULL);CHKERRQ(ierr);
>>> 
>>>  ierr = DAVecGetArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr);
>>>  ierr = DAVecGetArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr);
>>>  ierr = DAVecGetArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr);
>>>  for(i=cys; i<cys+cym; i++) {
>>>    for(j=cxs; j<cxs+cxm; j++) {
>>>      PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y),
>>>    y = PetscRealPart(coors[i][j].y);
>>>      if(x>=0) {
>>>    ierr = EvaluatePsiAndGradPsi(user, x, y,
>>> &(lpsi[i][j]),
>>> &(lGradPsi_x[i][j]),
>>> &(lGradPsi_y[i][j]));CHKERRQ(ierr);
>>>      }
>>>    }
>>>  }
>>>  ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr);
>>>  ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr);
>>>  ierr = DAVecRestoreArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr);
>>>  ierr = DAVecRestoreArray(cda, gc,&coors);CHKERRQ(ierr);
>>>  ierr = VecDestroy(gc);CHKERRQ(ierr);
>>>  ierr = DADestroy(cda);CHKERRQ(ierr);
>>>  return 0;
>>> }
> 


From jed at 59A2.org  Tue Feb  8 11:03:20 2011
From: jed at 59A2.org (Jed Brown)
Date: Tue, 8 Feb 2011 18:03:20 +0100
Subject: [petsc-users] Howto evaluate function on grid
In-Reply-To: <4D5174B1.8070601@physik.uni-freiburg.de>
References: <4D516C4A.6040207@physik.uni-freiburg.de>
	<B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>
	<4D5174B1.8070601@physik.uni-freiburg.de>
Message-ID: <AANLkTikK-Cn5EHmORzL6Ftnd+2qetJgOXbdirKRhUS6e@mail.gmail.com>

On Tue, Feb 8, 2011 at 17:52, Klaus Zimmermann <
klaus.zimmermann at physik.uni-freiburg.de> wrote:

> I agree. This is mostly because I didn't understand the concepts so well at
> the time I wrote this code and one of the reasons why I would like to
> refactor.
> In my case there should in principle be three output vectors. All the
> facilities I have seen in petsc only deal with a single output vector. Is
> this correct?
> Of course there is an obvious mapping, but I would prefer to keep the
> vectors apart because that way it is easier to deal with the parallel
> layout.
>

Packing them together will give you better memory performance. You can
extract separate pieces with the VecStride functions if you need it
separate. If you have a really good reason for storing them separately,
petsc-dev has VecNest which lets you treat several vectors as one, but some
operations are more expensive and I would not recommend using it for your
purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110208/c5f2425e/attachment-0001.htm>

From klaus.zimmermann at physik.uni-freiburg.de  Tue Feb  8 11:15:48 2011
From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann)
Date: Tue, 08 Feb 2011 18:15:48 +0100
Subject: [petsc-users] Howto evaluate function on grid
In-Reply-To: <8B8D3CD3-9970-4838-8C33-0FABE332A8C8@mcs.anl.gov>
References: <4D516C4A.6040207@physik.uni-freiburg.de>	<B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>	<4D5174B1.8070601@physik.uni-freiburg.de>
	<8B8D3CD3-9970-4838-8C33-0FABE332A8C8@mcs.anl.gov>
Message-ID: <4D517A44.8070502@physik.uni-freiburg.de>

On 02/08/2011 06:00 PM, Barry Smith wrote:
> On Feb 8, 2011, at 10:52 AM, Klaus Zimmermann wrote:
>
>> Hi Barry,
>>
>> thanks for your response. I guess my code excerpt wasn't very good, nor was my description. My apologies.
>>
>> The evaluation in more details goes like this:
>> Depending on the coordinates x and y (and only on them) I calculate 4 vectors S1,...,S4.
>> I have a constant matrix C in the appctx. The three quantities I am interested in are then:
>>
>> 1) u1 = (x+y)*VecTDot(S1, MatMult(C, S2))
>> 2) u2 = u1/(x+y) + (x+y)*VecTDot(S3, MatMult(C, S2))
>> 3) u3 = u1/(x+y) + (x+y)*VecTDot(S1, MatMult(C, S4))
>>
>     How big are S?
Depending on parameter from 100 to 1000. Also C is dense.
>> With this (hopefully better) description let me answer to your points below individually.
>> On 02/08/2011 05:29 PM, Barry Smith wrote:
>>> On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote:
>>>
>>>> Dear all,
>>>>
>>>> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message).
>>>> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the
>>>> program to higher dimensions and also unstructured grids.
>>>     There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them.
>> This is why I hoped to reuse code for the unstructured version: As long as I can call a method with coordinates and context I am fine.
>> I don't really need any solving. Do I still need different FormFunctions?
>    NO
Ok.
>>>> As I understand it now there are several
>>>> candidates in PETSC for doing this:
>>>>
>>>> 1) PF
>>>> 2) DALocalFunction
>>>     Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES()
>> I agree. This is mostly because I didn't understand the concepts so well at the time I wrote this code and one of the reasons why I would like to refactor.
>> In my case there should in principle be three output vectors. All the facilities I have seen in petsc only deal with a single output vector. Is this correct?
>> Of course there is an obvious mapping, but I would prefer to keep the vectors apart because that way it is easier to deal with the parallel layout.
>    You can keep them separate. You can have as many vector inputs and outputs you want (it is only the SNES solvers that need exactly one input and one output). Sometimes storing several vectors "interlaced" gives
> better performance since it uses the cache's better but that is only an optimization.
>
>    If you separate the "iterater" part of the code from the "function" part then you can have a different iterator for structured and unstructured grid but reuse the "function" part.
So is there some general iterator code I could use? With regards to the 
vector layout: After the evaluation I want to calculate quantities like 
PointwiseMult(VecConjugate(u1),u2). I thought that for this it would be 
advantageous to have the output vectors layed out in the same way. Do 
you think the interleaved layout works as well?
>>>> 3) FIAT (?)
>>>>
>>>> Could you please advise on what should be used?
>>>> An additional problem is that several distinct functions should be evaluated at the same time due to the
>>>> due to the reuse of intermediate results.
>>>     Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions.
>> I am not really using any solving at the moment. Please let me know if you need more detail.
>>
>> Thanks again,
>> Klaus
>>
>>>     Barry
>>>
>>>> Any help is appreciated!
>>>> Thanks in advance,
>>>> Klaus
>>>>
>>>> ------------------8<------------------8<------------------8<------------------8<------------------8<--------------
>>>>
>>>> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) {
>>>>   PetscErrorCode ierr;
>>>>   PetscInt i, j, cxs, cxm, cys, cym;
>>>>   PetscScalar **lpsi;
>>>>   PetscScalar **lGradPsi_x, **lGradPsi_y;
>>>>   Vec gc;
>>>>   DACoor2d **coors;
>>>>   DA cda;
>>>>   ierr = DAGetCoordinateDA(user->zGrid,&cda);CHKERRQ(ierr);
>>>>   ierr = DAGetCoordinates(user->zGrid,&gc);CHKERRQ(ierr);
>>>>   ierr = DAVecGetArray(cda, gc,&coors);CHKERRQ(ierr);
>>>>   ierr = DAGetCorners(cda,&cxs,&cys, PETSC_NULL,&cxm,&cym, PETSC_NULL);CHKERRQ(ierr);
>>>>
>>>>   ierr = DAVecGetArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr);
>>>>   ierr = DAVecGetArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr);
>>>>   ierr = DAVecGetArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr);
>>>>   for(i=cys; i<cys+cym; i++) {
>>>>     for(j=cxs; j<cxs+cxm; j++) {
>>>>       PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y),
>>>>     y = PetscRealPart(coors[i][j].y);
>>>>       if(x>=0) {
>>>>     ierr = EvaluatePsiAndGradPsi(user, x, y,
>>>> &(lpsi[i][j]),
>>>> &(lGradPsi_x[i][j]),
>>>> &(lGradPsi_y[i][j]));CHKERRQ(ierr);
>>>>       }
>>>>     }
>>>>   }
>>>>   ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr);
>>>>   ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr);
>>>>   ierr = DAVecRestoreArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr);
>>>>   ierr = DAVecRestoreArray(cda, gc,&coors);CHKERRQ(ierr);
>>>>   ierr = VecDestroy(gc);CHKERRQ(ierr);
>>>>   ierr = DADestroy(cda);CHKERRQ(ierr);
>>>>   return 0;
>>>> }


From klaus.zimmermann at physik.uni-freiburg.de  Tue Feb  8 11:17:43 2011
From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann)
Date: Tue, 08 Feb 2011 18:17:43 +0100
Subject: [petsc-users] Howto evaluate function on grid
In-Reply-To: <AANLkTikK-Cn5EHmORzL6Ftnd+2qetJgOXbdirKRhUS6e@mail.gmail.com>
References: <4D516C4A.6040207@physik.uni-freiburg.de>	<B50C6154-88C1-44FB-8D97-D81A9D70A957@mcs.anl.gov>	<4D5174B1.8070601@physik.uni-freiburg.de>
	<AANLkTikK-Cn5EHmORzL6Ftnd+2qetJgOXbdirKRhUS6e@mail.gmail.com>
Message-ID: <4D517AB7.6060205@physik.uni-freiburg.de>

On 02/08/2011 06:03 PM, Jed Brown wrote:
> On Tue, Feb 8, 2011 at 17:52, Klaus Zimmermann 
> <klaus.zimmermann at physik.uni-freiburg.de 
> <mailto:klaus.zimmermann at physik.uni-freiburg.de>> wrote:
>
>     I agree. This is mostly because I didn't understand the concepts
>     so well at the time I wrote this code and one of the reasons why I
>     would like to refactor.
>     In my case there should in principle be three output vectors. All
>     the facilities I have seen in petsc only deal with a single output
>     vector. Is this correct?
>     Of course there is an obvious mapping, but I would prefer to keep
>     the vectors apart because that way it is easier to deal with the
>     parallel layout.
>
>
> Packing them together will give you better memory performance. You can 
> extract separate pieces with the VecStride functions if you need it 
> separate. If you have a really good reason for storing them 
> separately, petsc-dev has VecNest which lets you treat several vectors 
> as one, but some operations are more expensive and I would not 
> recommend using it for your purposes.
Thanks for the info. I guess I'll have them interleaved then and extract 
the components for the global calculations afterwards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110208/3f50f0a9/attachment.htm>

From pengxwang at hotmail.com  Wed Feb  9 09:58:34 2011
From: pengxwang at hotmail.com (Peter Wang)
Date: Wed, 9 Feb 2011 09:58:34 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>,
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
Message-ID: <BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>


Thanks Barry,
 
    I run the code with -ksp_monitor_true_residual  -ksp_converged_reason, and it turns out that the computation didn't get the real convergence.  After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) 
at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07
  IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07.
and it didn't converge yet.
 
  I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help.
 
  
> From: bsmith at mcs.anl.gov
> Date: Sun, 6 Feb 2011 21:30:56 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] questions about the multigrid framework
> 
> 
> On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> 
> > Hello, I have some concerns about the multigrid framework in PETSc.
> > 
> > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain.
> > 
> > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. 
> 
> Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh.
> 
> Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol <tol> for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12
> 
> Barry
> 
> We'll deal with the multigrid questions after we've resolved the more basic issues.
> 
> 
> > The solver I used is a KSP solver in PETSc, which is set by calling :
> > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
> > 
> > I did some research work on the website and found the slides by Barry on
> > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you:
> > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem?
> > 
> > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
> > 
> > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
> > 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/2f3c328b/attachment.htm>

From knepley at gmail.com  Wed Feb  9 10:00:37 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 9 Feb 2011 10:00:37 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>
Message-ID: <AANLkTinGXaDfNA_=aoAZqhHNR5mAn_0UF9L--WxVX_h4@mail.gmail.com>

On Wed, Feb 9, 2011 at 9:58 AM, Peter Wang <pengxwang at hotmail.com> wrote:

>  Thanks Barry,
>
>     I run the code with -ksp_monitor_true_residual  -ksp_converged_reason,
> and it turns out that the computation didn't get the real convergence.
> After I set the rtol and more iteration, the numerical solution get better.
> However, the computation converges very slowly with finer grid points. For
> example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution
> varys mainly in y direction)
> at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax||
> 9.159199925235e-07
>   IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax||
> 9.159174196917e-07.
> and it didn't converge yet.
>
>   I am wondering if the solver is changed, the convergency speed could get
> fater? Or, I should take anohte approach to use finer grids, like multigrid?
> Thanks for your help.
>

If you can get MG to work for your problem, its optimal. All the Krylov
methods alone will get worse with increasing grid size.

   Matt


>
> > From: bsmith at mcs.anl.gov
> > Date: Sun, 6 Feb 2011 21:30:56 -0600
> > To: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] questions about the multigrid framework
> >
> >
> > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> >
> > > Hello, I have some concerns about the multigrid framework in PETSc.
> > >
> > > We are trying to solve a two dimensional problem with a large variety
> in length scales. The length of computational domain is in order of 1e3 m,
> and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in
> a corner of the domain.
> > >
> > > As a first thinking, we tried to solve the problem with a larger number
> of uniform or non-uniform grids. However, the error of the numerical
> solution increases when the number of the grid is too large. In order to
> test the effect of the grid size on the solution, a domain with regular
> scale of 1m by 1m was tried to solve. It is found that the extreme small
> grid size might lead to large variation to the exact solution. For example,
> the exact solution is a linear distribution in the domain. The numerical
> solution is linear as similar as the exact solution when the grid number is
> nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the
> numerical solution varies to nonlinear distribution which boundary is the
> only same as the exact solution.
> >
> > Stop right here. 99.9% of the time what you describe should not happen,
> with a finer grid your solution (for a problem with a known solution for
> example) will be more accurate and won't suddenly get less accurate with a
> finer mesh.
> >
> > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to
> make sure that it is converging? and using a smaller -ksp_rtol <tol> for
> more grid points. For example with 10,000 grid points in each direction and
> no better idea of what the discretization error is I would use a tol of
> 1.e-12
> >
> > Barry
> >
> > We'll deal with the multigrid questions after we've resolved the more
> basic issues.
> >
> >
> > > The solver I used is a KSP solver in PETSc, which is set by calling :
> > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this
> solver is not suitable to the system with small size grid? Or, whether the
> problem crossing 6 orders of length scale is solvable with only one level
> grid system when the memory is enough for large matrix? Since there is less
> coding work for one level grid size, it would be easy to implement the
> solver.
> > >
> > > I did some research work on the website and found the slides by Barry
> on
> > >
> http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > > It seems that the multigrid framework in PETSc is a possible approach
> to our problem. We are thinking to turn to the multigrid framework in PETSc
> to solve the problem. However, before we dig into it, there are some issues
> confusing us. It would be great if we can get any suggestion from you:
> > > 1 Whether the multigrid framework can handle the problem with a large
> variety in length scales (up to 6 orders)? Is DMMG is the best tool for our
> problem?
> > >
> > > 2 The coefficient matrix A and the right hand side vector b were
> created for the finite difference scheme of the domain and solved by KSP
> solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it
> easy to immigrate the created Matrix A and Vector b to the multigrid
> framework?
> > >
> > > 3 How many levels of the subgrid are needed to obtain a solution close
> enough to the exact solution for a problem with 6 orders in length scale?
> > >
> >
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/b9f55b51/attachment.htm>

From adube1 at tigers.lsu.edu  Wed Feb  9 10:10:48 2011
From: adube1 at tigers.lsu.edu (Anuj Dube)
Date: Wed, 9 Feb 2011 10:10:48 -0600
Subject: [petsc-users] Regarding Installation
Message-ID: <AANLkTik0zcQ9esEaao1LwoGLrQDWd-xvgMOoG7aJEtGR@mail.gmail.com>

Dear Sir/ Madam

I have downloaded the zip folder but it seems that that the setup file is
.py and I do not have python on my system. Is there a way to install PETSc
to my windows system without downloading Python?


-- 


Anuj Dube
Dept. of Computer Science
Louisiana State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/7f13170f/attachment.htm>

From knepley at gmail.com  Wed Feb  9 10:31:37 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 9 Feb 2011 10:31:37 -0600
Subject: [petsc-users] Regarding Installation
In-Reply-To: <AANLkTik0zcQ9esEaao1LwoGLrQDWd-xvgMOoG7aJEtGR@mail.gmail.com>
References: <AANLkTik0zcQ9esEaao1LwoGLrQDWd-xvgMOoG7aJEtGR@mail.gmail.com>
Message-ID: <AANLkTinZdqgfhr5ogg7d14Oi6m8a4Z9DC_cMGrZfr0tf@mail.gmail.com>

On Wed, Feb 9, 2011 at 10:10 AM, Anuj Dube <adube1 at tigers.lsu.edu> wrote:

> Dear Sir/ Madam
>
> I have downloaded the zip folder but it seems that that the setup file is
> .py and I do not have python on my system. Is there a way to install PETSc
> to my windows system without downloading Python?


No. The configuration system uses Python and the build system uses Cygwin
(which has Python).

   Matt


>
> --
>
>
> Anuj Dube
> Dept. of Computer Science
> Louisiana State University
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/258efe97/attachment.htm>

From balay at mcs.anl.gov  Wed Feb  9 10:44:24 2011
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 9 Feb 2011 10:44:24 -0600 (CST)
Subject: [petsc-users] Regarding Installation
In-Reply-To: <AANLkTinZdqgfhr5ogg7d14Oi6m8a4Z9DC_cMGrZfr0tf@mail.gmail.com>
References: <AANLkTik0zcQ9esEaao1LwoGLrQDWd-xvgMOoG7aJEtGR@mail.gmail.com>
	<AANLkTinZdqgfhr5ogg7d14Oi6m8a4Z9DC_cMGrZfr0tf@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1102091043260.2501@localhost6.localdomain6>

On Wed, 9 Feb 2011, Matthew Knepley wrote:

> On Wed, Feb 9, 2011 at 10:10 AM, Anuj Dube <adube1 at tigers.lsu.edu> wrote:
> 
> > Dear Sir/ Madam
> >
> > I have downloaded the zip folder but it seems that that the setup file is
> > .py and I do not have python on my system. Is there a way to install PETSc
> > to my windows system without downloading Python?
> 
> 
> No. The configuration system uses Python and the build system uses Cygwin
> (which has Python).

i.e you need cygwin-python and other cygwin tools - not regular/win-python

check the installation instructions.

Satish 

From pengxwang at hotmail.com  Wed Feb  9 10:44:24 2011
From: pengxwang at hotmail.com (Peter Wang)
Date: Wed, 9 Feb 2011 10:44:24 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <AANLkTinGXaDfNA_=aoAZqhHNR5mAn_0UF9L--WxVX_h4@mail.gmail.com>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>,
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>,
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>,
	<AANLkTinGXaDfNA_=aoAZqhHNR5mAn_0UF9L--WxVX_h4@mail.gmail.com>
Message-ID: <BAY128-W2608C02DC186D9949901D6B4ED0@phx.gbl>


Thank, Matt,
 
   Did you mean All the Krylov methods alone will get worse with increasing grid number? Since the finer grid has smaller size and more number of grid.
 
   Since I am a new user of PETSc, the easiest way for me is still keep in KSP solver. However, if the solver cannot satisfy the speed reqirement. I am thinking to use MG method. However, I don't have any experience on multigrid. Could you please give me some suggestion on it? 
 
 1, Since I  have built the Matrix and the vector for finite difference scheme in KSP solver, where should I start from to transfer to multigrid?  I studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good prototype to be based on to create my own code?  Is DMMG is the best tool for my problem?

 
 2,  How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
 
 3, The procedure of building Matrix and RHS vector in MG method is to build the matrix and RHS in the finest grid level and the MG will start the computation from the coarsest level, right?
 
Thanks for your considerate reponse.

 
Date: Wed, 9 Feb 2011 10:00:37 -0600
From: knepley at gmail.com
To: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] questions about the multigrid framework

On Wed, Feb 9, 2011 at 9:58 AM, Peter Wang <pengxwang at hotmail.com> wrote:


Thanks Barry,
 
    I run the code with -ksp_monitor_true_residual  -ksp_converged_reason, and it turns out that the computation didn't get the real convergence.  After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) 
at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07
  IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07.
and it didn't converge yet.
 
  I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help.


If you can get MG to work for your problem, its optimal. All the Krylov methods alone will get worse with increasing grid size.


   Matt
 

> From: bsmith at mcs.anl.gov
> Date: Sun, 6 Feb 2011 21:30:56 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] questions about the multigrid framework
> 
> 
> On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> 
> > Hello, I have some concerns about the multigrid framework in PETSc.
> > 
> > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain.
> > 
> > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. 
> 
> Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh.
> 
> Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol <tol> for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12
> 
> Barry
> 
> We'll deal with the multigrid questions after we've resolved the more basic issues.
> 
> 
> > The solver I used is a KSP solver in PETSc, which is set by calling :
> > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
> > 
> > I did some research work on the website and found the slides by Barry on
> > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you:
> > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem?
> > 
> > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
> > 
> > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
> > 
> 


-- 
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/271a4491/attachment.htm>

From knepley at gmail.com  Wed Feb  9 10:53:00 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 9 Feb 2011 10:53:00 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <BAY128-W2608C02DC186D9949901D6B4ED0@phx.gbl>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>
	<AANLkTinGXaDfNA_=aoAZqhHNR5mAn_0UF9L--WxVX_h4@mail.gmail.com>
	<BAY128-W2608C02DC186D9949901D6B4ED0@phx.gbl>
Message-ID: <AANLkTiknMAEp6cksv1adocdxCqY0Sa_=EBzBjcOP-+Dz@mail.gmail.com>

On Wed, Feb 9, 2011 at 10:44 AM, Peter Wang <pengxwang at hotmail.com> wrote:

>  Thank, Matt,
>
>    Did you mean All the Krylov methods alone will get worse with increasing
> grid number? Since the finer grid has smaller size and more number of grid.
>
>    Since I am a new user of PETSc, the easiest way for me is still keep in
> KSP solver. However, if the solver cannot satisfy the speed reqirement. I am
> thinking to use MG method. However, I don't have any experience
> on multigrid. Could you please give me some suggestion on it?
>

The best thing to do is get a book about solvers and preconditioners. All
your questions depend on what type of operator
you are trying to invert. I recommend Saad for Iteravtive Methods and maybe
Briggs for an intro to MG. Barry's book has
a good overview of Domain Decomposition.

 Thanks,

    Matt


>
>  1, Since I  have built the Matrix and the vector for finite difference
> scheme in KSP solver, where should I start from to transfer to multigrid?  I
> studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good
> prototype to be based on to create my own code?  Is DMMG is the best tool
> for my problem?
>
>
>  2,  How many levels of the subgrid are needed to obtain a solution close
> enough to the exact solution for a problem with 6 orders in length scale?
>
>  3, The procedure of building Matrix and RHS vector in MG method is to
> build the matrix and RHS in the finest grid level and the MG will start the
> computation from the coarsest level, right?
>
> Thanks for your considerate reponse.
>
>
>
>
> ------------------------------
> Date: Wed, 9 Feb 2011 10:00:37 -0600
> From: knepley at gmail.com
>
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] questions about the multigrid framework
>
> On Wed, Feb 9, 2011 at 9:58 AM, Peter Wang <pengxwang at hotmail.com> wrote:
>
> Thanks Barry,
>
>     I run the code with -ksp_monitor_true_residual  -ksp_converged_reason,
> and it turns out that the computation didn't get the real convergence.
> After I set the rtol and more iteration, the numerical solution get better.
> However, the computation converges very slowly with finer grid points. For
> example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution
> varys mainly in y direction)
> at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax||
> 9.159199925235e-07
>   IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax||
> 9.159174196917e-07.
> and it didn't converge yet.
>
>   I am wondering if the solver is changed, the convergency speed could get
> fater? Or, I should take anohte approach to use finer grids, like multigrid?
> Thanks for your help.
>
>
> If you can get MG to work for your problem, its optimal. All the Krylov
> methods alone will get worse with increasing grid size.
>
>    Matt
>
>
>
> > From: bsmith at mcs.anl.gov
> > Date: Sun, 6 Feb 2011 21:30:56 -0600
> > To: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] questions about the multigrid framework
> >
> >
> > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> >
> > > Hello, I have some concerns about the multigrid framework in PETSc.
> > >
> > > We are trying to solve a two dimensional problem with a large variety
> in length scales. The length of computational domain is in order of 1e3 m,
> and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in
> a corner of the domain.
> > >
> > > As a first thinking, we tried to solve the problem with a larger number
> of uniform or non-uniform grids. However, the error of the numerical
> solution increases when the number of the grid is too large. In order to
> test the effect of the grid size on the solution, a domain with regular
> scale of 1m by 1m was tried to solve. It is found that the extreme small
> grid size might lead to large variation to the exact solution. For example,
> the exact solution is a linear distribution in the domain. The numerical
> solution is linear as similar as the exact solution when the grid number is
> nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the
> numerical solution varies to nonlinear distribution which boundary is the
> only same as the exact solution.
> >
> > Stop right here. 99.9% of the time what you describe should not happen,
> with a finer grid your solution (for a problem with a known solution for
> example) will be more accurate and won't suddenly get less accurate with a
> finer mesh.
> >
> > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to
> make sure that it is converging? and using a smaller -ksp_rtol <tol> for
> more grid points. For example with 10,000 grid points in each direction and
> no better idea of what the discretization error is I would use a tol of
> 1.e-12
> >
> > Barry
> >
> > We'll deal with the multigrid questions after we've resolved the more
> basic issues.
> >
> >
> > > The solver I used is a KSP solver in PETSc, which is set by calling :
> > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this
> solver is not suitable to the system with small size grid? Or, whether the
> problem crossing 6 orders of length scale is solvable with only one level
> grid system when the memory is enough for large matrix? Since there is less
> coding work for one level grid size, it would be easy to implement the
> solver.
> > >
> > > I did some research work on the website and found the slides by Barry
> on
> > >
> http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > > It seems that the multigrid framework in PETSc is a possible approach
> to our problem. We are thinking to turn to the multigrid framework in PETSc
> to solve the problem. However, before we dig into it, there are some issues
> confusing us. It would be great if we can get any suggestion from you:
> > > 1 Whether the multigrid framework can handle the problem with a large
> variety in length scales (up to 6 orders)? Is DMMG is the best tool for our
> problem?
> > >
> > > 2 The coefficient matrix A and the right hand side vector b were
> created for the finite difference scheme of the domain and solved by KSP
> solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it
> easy to immigrate the created Matrix A and Vector b to the multigrid
> framework?
> > >
> > > 3 How many levels of the subgrid are needed to obtain a solution close
> enough to the exact solution for a problem with 6 orders in length scale?
> > >
> >
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/5f3f7b02/attachment.htm>

From jed at 59A2.org  Wed Feb  9 10:54:19 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 9 Feb 2011 17:54:19 +0100
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <BAY128-W2608C02DC186D9949901D6B4ED0@phx.gbl>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>
	<AANLkTinGXaDfNA_=aoAZqhHNR5mAn_0UF9L--WxVX_h4@mail.gmail.com>
	<BAY128-W2608C02DC186D9949901D6B4ED0@phx.gbl>
Message-ID: <AANLkTikHkDvYAqMg7ZQKuuB7pP005s1nZw=_Sfg8z1OW@mail.gmail.com>

On Wed, Feb 9, 2011 at 17:44, Peter Wang <pengxwang at hotmail.com> wrote:

> Did you mean All the Krylov methods alone will get worse with increasing
> grid number?
>

Yes, the number of Krylov iterations for second order elliptic problems with
no preconditioner scales proportional to the number of grid points in any
direction. You need a spectrally equivalent preconditioner, usually
multigrid of some sort, to prevent this.


> Since the finer grid has smaller size and more number of grid.
>
>    Since I am a new user of PETSc, the easiest way for me is still keep in
> KSP solver. However, if the solver cannot satisfy the speed reqirement. I am
> thinking to use MG method. However, I don't have any experience
> on multigrid. Could you please give me some suggestion on it?
>
>  1, Since I  have built the Matrix and the vector for finite difference
> scheme in KSP solver, where should I start from to transfer to multigrid?  I
> studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good
> prototype to be based on to create my own code?  Is DMMG is the best tool
> for my problem?
>

Assuming you currently assemble a matrix, just configure PETSc with
--download-ml and --download-hypre, then try running your code with -pc_type
ml or -pc_type hypre. You can use geometric multigrid later to improve the
constants or handle cases where algebraic multigrid (ML or BoomerAMG from
Hypre) are having trouble.

You need to tell us what equations you are solving if you want useful
suggestions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/95abcab5/attachment-0001.htm>

From pengxwang at hotmail.com  Wed Feb  9 11:22:59 2011
From: pengxwang at hotmail.com (Peter Wang)
Date: Wed, 9 Feb 2011 11:22:59 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <AANLkTikHkDvYAqMg7ZQKuuB7pP005s1nZw=_Sfg8z1OW@mail.gmail.com>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>,
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>,
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>,
	<AANLkTinGXaDfNA_=aoAZqhHNR5mAn_0UF9L--WxVX_h4@mail.gmail.com>,
	<BAY128-W2608C02DC186D9949901D6B4ED0@phx.gbl>,
	<AANLkTikHkDvYAqMg7ZQKuuB7pP005s1nZw=_Sfg8z1OW@mail.gmail.com>
Message-ID: <BAY128-W8137D7434DBDDA5BF1C49B4ED0@phx.gbl>


Thanks a lot, Jed,
 
      I will try the algebric multigrid first. 
  
     In order to  configure PETSc with --download-ml and --download-hypre,which shell file in unix should I modify?
 
     Should I add some line in my current code to run with -pc_type ml or -pc_type hypre, or just use runtime option? 
 
     I am solving a 2-D poisson equation with finite difference scheme. Please find the problem discription as attached if it is necessary. Thanks again.
     
    
Date: Wed, 9 Feb 2011 17:54:19 +0100
From: jed at 59A2.org
To: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] questions about the multigrid framework


On Wed, Feb 9, 2011 at 17:44, Peter Wang <pengxwang at hotmail.com> wrote:


Did you mean All the Krylov methods alone will get worse with increasing grid number?


Yes, the number of Krylov iterations for second order elliptic problems with no preconditioner scales proportional to the number of grid points in any direction. You need a spectrally equivalent preconditioner, usually multigrid of some sort, to prevent this.
 

Since the finer grid has smaller size and more number of grid.
 
   Since I am a new user of PETSc, the easiest way for me is still keep in KSP solver. However, if the solver cannot satisfy the speed reqirement. I am thinking to use MG method. However, I don't have any experience on multigrid. Could you please give me some suggestion on it? 
 
 1, Since I  have built the Matrix and the vector for finite difference scheme in KSP solver, where should I start from to transfer to multigrid?  I studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good prototype to be based on to create my own code?  Is DMMG is the best tool for my problem?


Assuming you currently assemble a matrix, just configure PETSc with --download-ml and --download-hypre, then try running your code with -pc_type ml or -pc_type hypre. You can use geometric multigrid later to improve the constants or handle cases where algebraic multigrid (ML or BoomerAMG from Hypre) are having trouble.


You need to tell us what equations you are solving if you want useful suggestions. 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/683ae644/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Problem_discription.pdf
Type: application/pdf
Size: 101028 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/683ae644/attachment-0001.pdf>

From bsmith at mcs.anl.gov  Wed Feb  9 11:36:47 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 9 Feb 2011 11:36:47 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>,
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>
Message-ID: <6A8840D2-4317-4A31-997D-242F898F32AD@mcs.anl.gov>


On Feb 9, 2011, at 9:58 AM, Peter Wang wrote:

> Thanks Barry,
>  
>     I run the code with -ksp_monitor_true_residual  -ksp_converged_reason, and it turns out that the computation didn't get the real convergence.  After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) 
> at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07
>   IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07.
> and it didn't converge yet.
>  
>   I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help.

    You have a little confusion here. Multigrid (in the context of PETSc and numerical solvers) is ONLY an efficient way to solve a set of linear equations arising from discretizing a PDE. It is not a different way of discretizing the PDEs or giving a different or better solution. It is only a way of getting the same solution (potentially much) faster than running the slower convergent solver.

    Definitely configure PETSc with --download-ml --download-hypre and make runs using -pc_type hypre and then -pc_type ml to see how algebraic multigrid works, it should work fine for your problem.

   Barry


>  
>   
> > From: bsmith at mcs.anl.gov
> > Date: Sun, 6 Feb 2011 21:30:56 -0600
> > To: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] questions about the multigrid framework
> > 
> > 
> > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> > 
> > > Hello, I have some concerns about the multigrid framework in PETSc.
> > > 
> > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain.
> > > 
> > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. 
> > 
> > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh.
> > 
> > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol <tol> for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12
> > 
> > Barry
> > 
> > We'll deal with the multigrid questions after we've resolved the more basic issues.
> > 
> > 
> > > The solver I used is a KSP solver in PETSc, which is set by calling :
> > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
> > > 
> > > I did some research work on the website and found the slides by Barry on
> > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you:
> > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem?
> > > 
> > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
> > > 
> > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
> > > 
> > 


From jdbst21 at gmail.com  Wed Feb  9 13:02:00 2011
From: jdbst21 at gmail.com (Joshua Booth)
Date: Wed, 9 Feb 2011 14:02:00 -0500
Subject: [petsc-users] Metis NodeND in Petsc
Message-ID: <AANLkTinTqz-rK=-5Kan6jsMxWm5wheCcm0uG7K5C=bjL@mail.gmail.com>

Hello,

I have been looking for an easy was to use Metis's NodeND in Petsc in a
similar fashion as kways.  I was wondering if there is some way to go about
this using the metis interface.

Josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/b4bc2448/attachment.htm>

From pflath at ices.utexas.edu  Wed Feb  9 14:22:15 2011
From: pflath at ices.utexas.edu (Pearl Flath)
Date: Wed, 9 Feb 2011 14:22:15 -0600
Subject: [petsc-users] Matrix free SNES and Jacobian, function evaluations
Message-ID: <AANLkTi=uiBbuBwYXhde=J9m7y98fJtE-p42oJ2LSUY47@mail.gmail.com>

Dear All,
I'd like to use the SNES nonlinear solvers. For my problem, evaluation of
the Jacobian and the right hand side function involves some identical steps.
I'd prefer not to repeat those, and instead have the Jacobian and function
calculation share some computations. Is there a way to do this? Does SNES
consistently evaluate one of them first, and thus I could have the other one
re-use that information? Or is there a way to tell SNES to call a general
update at each step before evaluating the Jacobian and function?

Many thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/9b32f30e/attachment.htm>

From jed at 59A2.org  Wed Feb  9 14:31:11 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 9 Feb 2011 21:31:11 +0100
Subject: [petsc-users] Matrix free SNES and Jacobian,
	function evaluations
In-Reply-To: <AANLkTi=uiBbuBwYXhde=J9m7y98fJtE-p42oJ2LSUY47@mail.gmail.com>
References: <AANLkTi=uiBbuBwYXhde=J9m7y98fJtE-p42oJ2LSUY47@mail.gmail.com>
Message-ID: <AANLkTikXOJGRSmtmgEO+p6H=3yRt_FY1OZ95W0LDS4Q3@mail.gmail.com>

The function is always evaluated before the jacobian, but sometimes the
function is evaluated at several places before a jacobian is needed (e.g. in
a line search). You can cache any information you want in the user context
during function evaluation and use it to speed up jacobian evaluation.

On Feb 9, 2011 9:22 PM, "Pearl Flath" <pflath at ices.utexas.edu> wrote:

Dear All,
I'd like to use the SNES nonlinear solvers. For my problem, evaluation of
the Jacobian and the right hand side function involves some identical steps.
I'd prefer not to repeat those, and instead have the Jacobian and function
calculation share some computations. Is there a way to do this? Does SNES
consistently evaluate one of them first, and thus I could have the other one
re-use that information? Or is there a way to tell SNES to call a general
update at each step before evaluating the Jacobian and function?

Many thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/2ce1eb5c/attachment.htm>

From pengxwang at hotmail.com  Wed Feb  9 16:29:48 2011
From: pengxwang at hotmail.com (Peter Wang)
Date: Wed, 9 Feb 2011 16:29:48 -0600
Subject: [petsc-users] questions about the multigrid framework
In-Reply-To: <6A8840D2-4317-4A31-997D-242F898F32AD@mcs.anl.gov>
References: <BAY128-W8433FF3B5480FE333A063B4E80@phx.gbl>, ,
	<465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>,
	<BAY128-W175C85ECD18E84A4131965B4ED0@phx.gbl>,
	<6A8840D2-4317-4A31-997D-242F898F32AD@mcs.anl.gov>
Message-ID: <BAY128-W119680C5BB5CD4DB1AEE13B4ED0@phx.gbl>


Thanks a lot, Barry and Jed.
 
     Your explain is very clear and informative.  Your suggestions make me move forward to my goal smoothly. I will try it. 
 
 
> From: bsmith at mcs.anl.gov
> Date: Wed, 9 Feb 2011 11:36:47 -0600
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] questions about the multigrid framework
> 
> 
> On Feb 9, 2011, at 9:58 AM, Peter Wang wrote:
> 
> > Thanks Barry,
> > 
> > I run the code with -ksp_monitor_true_residual -ksp_converged_reason, and it turns out that the computation didn't get the real convergence. After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) 
> > at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07
> > IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07.
> > and it didn't converge yet.
> > 
> > I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help.
> 
> You have a little confusion here. Multigrid (in the context of PETSc and numerical solvers) is ONLY an efficient way to solve a set of linear equations arising from discretizing a PDE. It is not a different way of discretizing the PDEs or giving a different or better solution. It is only a way of getting the same solution (potentially much) faster than running the slower convergent solver.
> 
> Definitely configure PETSc with --download-ml --download-hypre and make runs using -pc_type hypre and then -pc_type ml to see how algebraic multigrid works, it should work fine for your problem.
> 
> Barry
> 
> 
> > 
> > 
> > > From: bsmith at mcs.anl.gov
> > > Date: Sun, 6 Feb 2011 21:30:56 -0600
> > > To: petsc-users at mcs.anl.gov
> > > Subject: Re: [petsc-users] questions about the multigrid framework
> > > 
> > > 
> > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote:
> > > 
> > > > Hello, I have some concerns about the multigrid framework in PETSc.
> > > > 
> > > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain.
> > > > 
> > > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. 
> > > 
> > > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh.
> > > 
> > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol <tol> for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12
> > > 
> > > Barry
> > > 
> > > We'll deal with the multigrid questions after we've resolved the more basic issues.
> > > 
> > > 
> > > > The solver I used is a KSP solver in PETSc, which is set by calling :
> > > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver.
> > > > 
> > > > I did some research work on the website and found the slides by Barry on
> > > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf
> > > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you:
> > > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem?
> > > > 
> > > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework?
> > > > 
> > > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale?
> > > > 
> > > 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/a1f1c021/attachment.htm>

From kenway at utias.utoronto.ca  Wed Feb  9 17:43:59 2011
From: kenway at utias.utoronto.ca (Gaetan Kenway)
Date: Wed, 9 Feb 2011 18:43:59 -0500
Subject: [petsc-users] PETSc MatMFFDSetFunction Function
Message-ID: <AANLkTinp3MX28wsjhK-e1pZHG_-Bg=97B4C65FZ7Zrv+@mail.gmail.com>

Hello

I was wondering what the user supplied function is supposed to look like for
setting the function in MatMFFDSetFunction. I am trying to use a Matrix-Free
Matrix for a linear Krylov Solver. The website says:

PetscErrorCode PETSCMAT_DLLEXPORT MatMFFDSetFunction(Mat
mat,PetscErrorCode (*func)(void*,Vec,Vec),void *funcctx)


This indicates the function should have the calling sequence: (void
*,Vec,Vec). Since there are zero examples of actually using this function,
what exactly is the sequence? I gather that the second and third arguments
are  Vec x and Vec y where x is the input and y is the output, but what is
the void * supposed to be.

I'm doing this in Fortran, so I really don't know what argument "void *"
should correspond to?

Currently my code looks like this:

! Setup Matrix-Free dRdw matrix
call MatCreateMFFD(sumb_comm_world,nDimW,nDimW,&
       PETSC_DETERMINE,PETSC_DETERMINE,dRdw,ierr)

call MatMFFDSetFunction(dRdw,FormFunction2,ctx,ierr)
call MatAssemblyBegin(dRdw,MAT_FINAL_ASSEMBLY,ierr)
call MatAssemblyEnd(dRdw,MAT_FINAL_ASSEMBLY,ierr)

The function prototype for FormFunction2 is:

subroutine FormFunction2(mfmat,wVec,rVec,ierr)
  Mat     mfmat
  Vec     wVec, rVec
  PetscInt ierr
end subroutine FormFunction2

When I try to use this in a KSP linear solve I get the following tracsback:

[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSCERROR:
or try
http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] MatMult line 1877 src/mat/interface/matrix.c
[0]PETSC ERROR: [0] PCApplyBAorAB line 540 src/ksp/pc/interface/precon.c
[0]PETSC ERROR: [0] GMREScycle line 132 src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: [0] KSPSolve_GMRES line 227 src/ksp/ksp/impls/gmres/gmres.c

If I use KSPPREONLY it works fine.

Thank you,

Gaetan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110209/72ed653f/attachment.htm>

From bsmith at mcs.anl.gov  Wed Feb  9 18:07:12 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 9 Feb 2011 18:07:12 -0600
Subject: [petsc-users] PETSc MatMFFDSetFunction Function
In-Reply-To: <AANLkTinp3MX28wsjhK-e1pZHG_-Bg=97B4C65FZ7Zrv+@mail.gmail.com>
References: <AANLkTinp3MX28wsjhK-e1pZHG_-Bg=97B4C65FZ7Zrv+@mail.gmail.com>
Message-ID: <6F3BBCF6-BBC2-4CD6-9FC6-E2BCFA8A7BAB@mcs.anl.gov>


  Gaetan,

    This is not normally used by users since most people work with SNES and SNES provides a simple wrapper that uses it with the nonlinear function your provide to SNES. If you are writing your own Newton's method and using KSP we recommend instead that you use SNES to handle the Newton's method for you.

    Anyways from Fortran the calling sequence is myfunction(void *ctx,Vec x,Vec y, integer ierr) where ierr is where you put 0 or an error code if you detect an error in your routine. You can pass PETSC_OBJECT_NULL as ctx and just no use it or you can pass an array or other Fortran thing that contains information that you wish to use in your function evaluation.

    If it still crashes after you get the right calling sequence you can use -start_in_debugger to see why it is crashing and quickly resolve the problem.

   Barry

On Feb 9, 2011, at 5:43 PM, Gaetan Kenway wrote:

> Hello
> 
> I was wondering what the user supplied function is supposed to look like for setting the function in MatMFFDSetFunction. I am trying to use a Matrix-Free Matrix for a linear Krylov Solver. The website says:
> PetscErrorCode PETSCMAT_DLLEXPORT MatMFFDSetFunction(Mat mat,PetscErrorCode (*func)(void*,Vec,Vec),void *funcctx)
> 
> This indicates the function should have the calling sequence: (void *,Vec,Vec). Since there are zero examples of actually using this function, what exactly is the sequence? I gather that the second and third arguments are  Vec x and Vec y where x is the input and y is the output, but what is the void * supposed to be. 
> 
> I'm doing this in Fortran, so I really don't know what argument "void *" should correspond to?
> 
> Currently my code looks like this:
> 
> ! Setup Matrix-Free dRdw matrix
> call MatCreateMFFD(sumb_comm_world,nDimW,nDimW,&
>        PETSC_DETERMINE,PETSC_DETERMINE,dRdw,ierr)
> 
> call MatMFFDSetFunction(dRdw,FormFunction2,ctx,ierr)
> call MatAssemblyBegin(dRdw,MAT_FINAL_ASSEMBLY,ierr)
> call MatAssemblyEnd(dRdw,MAT_FINAL_ASSEMBLY,ierr)
> 
> The function prototype for FormFunction2 is:
> 
> subroutine FormFunction2(mfmat,wVec,rVec,ierr)
>   Mat     mfmat
>   Vec     wVec, rVec
>   PetscInt ierr
> end subroutine FormFunction2
> 
> When I try to use this in a KSP linear solve I get the following tracsback:
> 
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] MatMult line 1877 src/mat/interface/matrix.c
> [0]PETSC ERROR: [0] PCApplyBAorAB line 540 src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: [0] GMREScycle line 132 src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: [0] KSPSolve_GMRES line 227 src/ksp/ksp/impls/gmres/gmres.c
> 
> If I use KSPPREONLY it works fine.
> 
> Thank you,
> 
> Gaetan


From aron.ahmadia at kaust.edu.sa  Thu Feb 10 07:44:09 2011
From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia)
Date: Thu, 10 Feb 2011 16:44:09 +0300
Subject: [petsc-users] [petsc-dev] configure option missing for MPI.h /
	on IBM machine
In-Reply-To: <AANLkTikjNPSxgd9oUz3yStpCLdGfxBRxq3Vnpa3t9P+x@mail.gmail.com>
References: <AANLkTikjNPSxgd9oUz3yStpCLdGfxBRxq3Vnpa3t9P+x@mail.gmail.com>
Message-ID: <AANLkTi=6hhruDT-uYRspEK+PkFGfJBKAsC71upmarYxX@mail.gmail.com>

add opt/ibmhpc/ppe.poe/include/ibmmpi/ to your ./configure options like
this:

--with-mpi-include=/opt/ibmhpc/ppe.poe/include/ibmmpi/

You may have to manually add the MPI libraries and their path as well, since
BuildSystem tends to like these packaged together.  Ping the list back if
you can't figure it out from there.

-Aron

On Thu, Feb 10, 2011 at 3:23 PM, lvankampenhout at gmail.com <
lvankampenhout at gmail.com> wrote:

> Hi all, i'm having this error when configuring the latest petsc-dev on an
> IBM PPC system.
>
>
> TESTING: CxxMPICheck from
> config.packages.MPI(/gpfs/h01/vkampenh/install/petsc-dev/config/BuildSystem/config/packages/MPI.py:611)
>
> *******************************************************************************
>          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for
> details):
>
> -------------------------------------------------------------------------------
> C++ error! mpi.h could not be located at: []
>
> *******************************************************************************
>
>
> My configure options: ./configure --with-batch=1 --with-mpi-shared=0
> --with-endian=big --with-memcmp-ok --sizeof-void-p=8 --sizeof-char=1
> --sizeof-short=2 --sizeof-int=4 --sizeof-long=8 --sizeof-size-t=8
> --sizeof-long-long=8 --sizeof-float=4 --sizeof-double=8 --bits-per-byte=8
> --sizeof-MPI-Comm=8 --sizeof-MPI-Fint=4 --have-mpi-long-double=1
> --with-f90-interface=rs6000 --with-cc="mpcc -compiler xlc_r -q64"
> --with-fc="mpfort -compiler xlf_r -q64" --FFLAGS="-O3 -qhot -qstrict
> -qarch=auto -qtune=auto" --CFLAGS="-O3 -qhot -qstrict -qarch=auto
> -qtune=auto" --LIBS=-lmass_64 --with-ar=/usr/bin/ar
> --prefix=/sara/sw/petsc/3.0.0-p8/real --with-scalar-type=real
> PETSC_ARCH=linux-ibm-pwr6-xlf-real-64 --with-shared=0 -with-debugging=0
> --download-ml --download-hypre
>
>
> vkampenh at p6012:~/install/petsc-dev> module list
> Currently Loaded Modulefiles:
>  1) compilerwrappers/yes  4) c++/ibm/11.1          7) upc/ibm/11.1
>  2) java/ibm/1.5          5) fortran/ibm/13.1
>  3) c/ibm/11.1            6) sara
>
>
> vkampenh at p6012:~/install/petsc-dev> locate mpi.h
> /opt/ibm/java2-ppc64-50/include/jvmpi.h
> /opt/ibmhpc/ppe.poe/include/ibmmpi/mpi.h
> /opt/mpich/include/mpi.h
> /usr/include/boost/mpi.hpp
> /usr/lib64/gcc/powerpc64-suse-linux/4.3/include/jvmpi.h
> /usr/lib64/mpi/gcc/openmpi/include/mpi.h
> /usr/lib64/mpi/gcc/openmpi/include/openmpi/ompi/mpi/f77/prototypes_mpi.h
>
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/default/include/config/usb/serial/siemens/mpi.h
>
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/ppc64/include/config/usb/serial/siemens/mpi.h
>
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/trace/include/config/usb/serial/siemens/mpi.h
> /usr/src/linux-2.6.32.27-0.2/drivers/message/fusion/lsi/mpi.h
>
>
> Is there an easy way to add the IBMHPC/PPE.POE directory to the configure
> list, so that it will be recognized? The machine uses LoadLeveler schedule
> system, which handles the MPI settings.
>
> Thanks,
> Leo
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110210/06a897d5/attachment.htm>

From bsmith at mcs.anl.gov  Thu Feb 10 11:58:23 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 10 Feb 2011 11:58:23 -0600
Subject: [petsc-users] [petsc-dev] configure option missing for MPI.h /
	on IBM machine
In-Reply-To: <AANLkTi=6hhruDT-uYRspEK+PkFGfJBKAsC71upmarYxX@mail.gmail.com>
References: <AANLkTikjNPSxgd9oUz3yStpCLdGfxBRxq3Vnpa3t9P+x@mail.gmail.com>
	<AANLkTi=6hhruDT-uYRspEK+PkFGfJBKAsC71upmarYxX@mail.gmail.com>
Message-ID: <58665C2D-BB35-45F0-BAFB-85ADCB1168B9@mcs.anl.gov>


  Aron,

   Shouldn't the mpcc and mpfort manage providing the include directories and libraries automatically (like everyone elses mpicc etc does?) Seems very cumbersome that users need to know they strange directories and include them themselves? A real step backwards in usability?

    Barry

On Feb 10, 2011, at 7:44 AM, Aron Ahmadia wrote:

> add opt/ibmhpc/ppe.poe/include/ibmmpi/ to your ./configure options like this:  
> 
> --with-mpi-include=/opt/ibmhpc/ppe.poe/include/ibmmpi/
> 
> You may have to manually add the MPI libraries and their path as well, since BuildSystem tends to like these packaged together.  Ping the list back if you can't figure it out from there.
> 
> -Aron
> 
> On Thu, Feb 10, 2011 at 3:23 PM, lvankampenhout at gmail.com <lvankampenhout at gmail.com> wrote:
> Hi all, i'm having this error when configuring the latest petsc-dev on an IBM PPC system. 
> 
> 
> TESTING: CxxMPICheck from config.packages.MPI(/gpfs/h01/vkampenh/install/petsc-dev/config/BuildSystem/config/packages/MPI.py:611)
> *******************************************************************************
>          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
> -------------------------------------------------------------------------------
> C++ error! mpi.h could not be located at: []
> *******************************************************************************
> 
> 
> My configure options: ./configure --with-batch=1 --with-mpi-shared=0 --with-endian=big --with-memcmp-ok --sizeof-void-p=8 --sizeof-char=1 --sizeof-short=2 --sizeof-int=4 --sizeof-long=8 --sizeof-size-t=8 --sizeof-long-long=8 --sizeof-float=4 --sizeof-double=8 --bits-per-byte=8 --sizeof-MPI-Comm=8 --sizeof-MPI-Fint=4 --have-mpi-long-double=1 --with-f90-interface=rs6000 --with-cc="mpcc -compiler xlc_r -q64" --with-fc="mpfort -compiler xlf_r -q64" --FFLAGS="-O3 -qhot -qstrict -qarch=auto -qtune=auto" --CFLAGS="-O3 -qhot -qstrict -qarch=auto -qtune=auto" --LIBS=-lmass_64 --with-ar=/usr/bin/ar --prefix=/sara/sw/petsc/3.0.0-p8/real --with-scalar-type=real PETSC_ARCH=linux-ibm-pwr6-xlf-real-64 --with-shared=0 -with-debugging=0 --download-ml --download-hypre
> 
> 
> vkampenh at p6012:~/install/petsc-dev> module list
> Currently Loaded Modulefiles:
>  1) compilerwrappers/yes  4) c++/ibm/11.1          7) upc/ibm/11.1         
>  2) java/ibm/1.5          5) fortran/ibm/13.1     
>  3) c/ibm/11.1            6) sara                 
> 
> 
> vkampenh at p6012:~/install/petsc-dev> locate mpi.h
> /opt/ibm/java2-ppc64-50/include/jvmpi.h
> /opt/ibmhpc/ppe.poe/include/ibmmpi/mpi.h
> /opt/mpich/include/mpi.h
> /usr/include/boost/mpi.hpp
> /usr/lib64/gcc/powerpc64-suse-linux/4.3/include/jvmpi.h
> /usr/lib64/mpi/gcc/openmpi/include/mpi.h
> /usr/lib64/mpi/gcc/openmpi/include/openmpi/ompi/mpi/f77/prototypes_mpi.h
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/default/include/config/usb/serial/siemens/mpi.h
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/ppc64/include/config/usb/serial/siemens/mpi.h
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/trace/include/config/usb/serial/siemens/mpi.h
> /usr/src/linux-2.6.32.27-0.2/drivers/message/fusion/lsi/mpi.h
> 
> 
> Is there an easy way to add the IBMHPC/PPE.POE directory to the configure list, so that it will be recognized? The machine uses LoadLeveler schedule system, which handles the MPI settings. 
> 
> Thanks, 
> Leo
> 
> 
> 


From aron.ahmadia at kaust.edu.sa  Thu Feb 10 13:45:36 2011
From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia)
Date: Thu, 10 Feb 2011 22:45:36 +0300
Subject: [petsc-users] [petsc-dev] configure option missing for MPI.h /
 on IBM machine
In-Reply-To: <58665C2D-BB35-45F0-BAFB-85ADCB1168B9@mcs.anl.gov>
References: <AANLkTikjNPSxgd9oUz3yStpCLdGfxBRxq3Vnpa3t9P+x@mail.gmail.com>
	<AANLkTi=6hhruDT-uYRspEK+PkFGfJBKAsC71upmarYxX@mail.gmail.com>
	<58665C2D-BB35-45F0-BAFB-85ADCB1168B9@mcs.anl.gov>
Message-ID: <AANLkTi=nMhe=9+Wi4BGcNRg3RCxce7hC4K_KCf2P-POT@mail.gmail.com>

I was wondering this myself.  On the BlueGene line the MPI installation is
based on MPICH, so the mpi* compilers behave as you'd expect on any other
MPICH install.  I'm not familiar with the voodoo in the IBM-HPC toolkit or
the intricacies of this particular machine, but since it's obviously being
administered by *somebody* (see the modules in Leo's environment), I'd
expect the administrators to have gotten it right.

A

On Thu, Feb 10, 2011 at 8:58 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>  Aron,
>
>   Shouldn't the mpcc and mpfort manage providing the include directories
> and libraries automatically (like everyone elses mpicc etc does?) Seems very
> cumbersome that users need to know they strange directories and include them
> themselves? A real step backwards in usability?
>
>    Barry
>
> On Feb 10, 2011, at 7:44 AM, Aron Ahmadia wrote:
>
> > add opt/ibmhpc/ppe.poe/include/ibmmpi/ to your ./configure options like
> this:
> >
> > --with-mpi-include=/opt/ibmhpc/ppe.poe/include/ibmmpi/
> >
> > You may have to manually add the MPI libraries and their path as well,
> since BuildSystem tends to like these packaged together.  Ping the list back
> if you can't figure it out from there.
> >
> > -Aron
> >
> > On Thu, Feb 10, 2011 at 3:23 PM, lvankampenhout at gmail.com <
> lvankampenhout at gmail.com> wrote:
> > Hi all, i'm having this error when configuring the latest petsc-dev on an
> IBM PPC system.
> >
> >
> > TESTING: CxxMPICheck from
> config.packages.MPI(/gpfs/h01/vkampenh/install/petsc-dev/config/BuildSystem/config/packages/MPI.py:611)
> >
> *******************************************************************************
> >          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for
> details):
> >
> -------------------------------------------------------------------------------
> > C++ error! mpi.h could not be located at: []
> >
> *******************************************************************************
> >
> >
> > My configure options: ./configure --with-batch=1 --with-mpi-shared=0
> --with-endian=big --with-memcmp-ok --sizeof-void-p=8 --sizeof-char=1
> --sizeof-short=2 --sizeof-int=4 --sizeof-long=8 --sizeof-size-t=8
> --sizeof-long-long=8 --sizeof-float=4 --sizeof-double=8 --bits-per-byte=8
> --sizeof-MPI-Comm=8 --sizeof-MPI-Fint=4 --have-mpi-long-double=1
> --with-f90-interface=rs6000 --with-cc="mpcc -compiler xlc_r -q64"
> --with-fc="mpfort -compiler xlf_r -q64" --FFLAGS="-O3 -qhot -qstrict
> -qarch=auto -qtune=auto" --CFLAGS="-O3 -qhot -qstrict -qarch=auto
> -qtune=auto" --LIBS=-lmass_64 --with-ar=/usr/bin/ar
> --prefix=/sara/sw/petsc/3.0.0-p8/real --with-scalar-type=real
> PETSC_ARCH=linux-ibm-pwr6-xlf-real-64 --with-shared=0 -with-debugging=0
> --download-ml --download-hypre
> >
> >
> > vkampenh at p6012:~/install/petsc-dev> module list
> > Currently Loaded Modulefiles:
> >  1) compilerwrappers/yes  4) c++/ibm/11.1          7) upc/ibm/11.1
> >  2) java/ibm/1.5          5) fortran/ibm/13.1
> >  3) c/ibm/11.1            6) sara
> >
> >
> > vkampenh at p6012:~/install/petsc-dev> locate mpi.h
> > /opt/ibm/java2-ppc64-50/include/jvmpi.h
> > /opt/ibmhpc/ppe.poe/include/ibmmpi/mpi.h
> > /opt/mpich/include/mpi.h
> > /usr/include/boost/mpi.hpp
> > /usr/lib64/gcc/powerpc64-suse-linux/4.3/include/jvmpi.h
> > /usr/lib64/mpi/gcc/openmpi/include/mpi.h
> > /usr/lib64/mpi/gcc/openmpi/include/openmpi/ompi/mpi/f77/prototypes_mpi.h
> >
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/default/include/config/usb/serial/siemens/mpi.h
> >
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/ppc64/include/config/usb/serial/siemens/mpi.h
> >
> /usr/src/linux-2.6.32.27-0.2-obj/ppc64/trace/include/config/usb/serial/siemens/mpi.h
> > /usr/src/linux-2.6.32.27-0.2/drivers/message/fusion/lsi/mpi.h
> >
> >
> > Is there an easy way to add the IBMHPC/PPE.POE directory to the configure
> list, so that it will be recognized? The machine uses LoadLeveler schedule
> system, which handles the MPI settings.
> >
> > Thanks,
> > Leo
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110210/42067981/attachment.htm>

From ckontzialis at lycos.com  Fri Feb 11 10:37:14 2011
From: ckontzialis at lycos.com (Kontsantinos Kontzialis)
Date: Fri, 11 Feb 2011 18:37:14 +0200
Subject: [petsc-users] Guidelines for solving the euler equations with an
 implicit matrix free approach
Message-ID: <4D5565BA.5040403@lycos.com>

Dear Petsc team,

  I'm new in Petsc and I'm trying to solve the euler equations of fluid 
dynamics
using an implicit matrix free approach with a spatial discontinues 
galerkin discretization.
I need some directions about how can I solve the following system:

(M/dt+dR/du)*DU=R

where R denotes the residual of the system and dR/du the residual 
jacobian. Please help.

Kostas


From ckontzialis at lycos.com  Fri Feb 11 10:37:57 2011
From: ckontzialis at lycos.com (Kontsantinos Kontzialis)
Date: Fri, 11 Feb 2011 18:37:57 +0200
Subject: [petsc-users] mail
Message-ID: <4D5565E5.8090502@lycos.com>

ckontzialis at lycos.com

From knepley at gmail.com  Sat Feb 12 16:12:36 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 12 Feb 2011 16:12:36 -0600
Subject: [petsc-users] Guidelines for solving the euler equations with
 an implicit matrix free approach
In-Reply-To: <4D5565BA.5040403@lycos.com>
References: <4D5565BA.5040403@lycos.com>
Message-ID: <AANLkTimus8N2jUu7Bvu0TZe59dr4FTpoJFh+m0QQUN-V@mail.gmail.com>

On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis <
ckontzialis at lycos.com> wrote:

> Dear Petsc team,
>
>  I'm new in Petsc and I'm trying to solve the euler equations of fluid
> dynamics
> using an implicit matrix free approach with a spatial discontinues galerkin
> discretization.
> I need some directions about how can I solve the following system:
>
> (M/dt+dR/du)*DU=R
>
> where R denotes the residual of the system and dR/du the residual jacobian.
> Please help.
>

Petsc provides linear algebra and nonlinear solvers. This is fine once you
have discretized.
It sounds like you will use DG:

  a) on a structured or unstructured grid?

The PETSc DA supports structured grids in any dimension. After this, you
want to use the
TS object to present your system. There are many examples in the TS, e.g.
ex10 for
radiation-diffusion or ex14 for hydrostatic ice flow.

Once you have your problem producing the correct residual and Jacobian, we
can talk
about solvers.

   Matt


> Kostas
>
-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110212/19a43c53/attachment.htm>

From khalid_eee at yahoo.com  Sat Feb 12 20:47:02 2011
From: khalid_eee at yahoo.com (khalid ashraf)
Date: Sat, 12 Feb 2011 18:47:02 -0800 (PST)
Subject: [petsc-users] Reading vtk file in parallel and assigning to an array
Message-ID: <388810.1602.qm@web112617.mail.gq1.yahoo.com>

Hi, I have a .vtk file that I want to read. I want to read one floating point 
from each line of the file and assign the value to an array.
If I use standard C commands like fscanf(), then it works on single processor 
but doesn't keep the right order when run on on multi processors. 
Could you please give a small code snippet to do it the PETSC way in parallel ?

Thanks.


____________________________________________________________________________________
Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives.
http://tools.search.yahoo.com/toolbar/features/mail/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110212/425e897f/attachment.htm>

From knepley at gmail.com  Sat Feb 12 21:41:51 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 12 Feb 2011 21:41:51 -0600
Subject: [petsc-users] Reading vtk file in parallel and assigning to an
	array
In-Reply-To: <388810.1602.qm@web112617.mail.gq1.yahoo.com>
References: <388810.1602.qm@web112617.mail.gq1.yahoo.com>
Message-ID: <AANLkTi=OCZ4Ob8M+V3+=V8eoiUVK2KTGM3a-3otMYbDD@mail.gmail.com>

On Sat, Feb 12, 2011 at 8:47 PM, khalid ashraf <khalid_eee at yahoo.com> wrote:

> Hi, I have a .vtk file that I want to read. I want to read one floating
> point from each line of the file and assign the value to an array.
> If I use standard C commands like fscanf(), then it works on single
> processor but doesn't keep the right order when run on on multi processors.
> Could you please give a small code snippet to do it the PETSC way in
> parallel ?
>

The easiest way is to read it in serial, and save it in PETSc binary format.
Then it can be loaded in parallel.

   Matt


> Thanks.
>
>
>
> ------------------------------
> Don't be flakey. Get Yahoo! Mail for Mobile<http://us.rd.yahoo.com/evt=43909/*http://mobile.yahoo.com/mail>and
> always stay connected<http://us.rd.yahoo.com/evt=43909/*http://mobile.yahoo.com/mail>to friends.
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110212/2db0053e/attachment.htm>

From gaurish108 at gmail.com  Sun Feb 13 16:37:11 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Sun, 13 Feb 2011 17:37:11 -0500
Subject: [petsc-users] Better way to pre-allocate memory for matrix being
	read in ???
Message-ID: <AANLkTin-XQ1Dgp=e8gxEMjy+wmyODz3uphjjeTa94awv@mail.gmail.com>

Hi,

I have a text file containing the non-zero entries of a sparse matrix of
dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j,
element] format.

That is ALL the information I have regarding the matrix.

However when pre-allocating memory with  MatSeqAIJSetPreallocation(Mat
B,PetscInt nz,const PetscInt nnz[]),  the parameters nz and nnz need to be
known,

nz=number of nonzeros per row (same for all rows)
nnz=array containing the number of nonzeros in the various rows (possibly
different for each row) or
PETSC_NULL<http://www.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-dev/docs/manualpages/Sys/PETSC_NULL.html#PETSC_NULL>

which i do not know for my matrix, ( unless I resort to using MATLAB. ).

Does that mean I have to set nz= 1274 (the length of a row) and
nnz=PETSC_NULL ?  Though,  I guess this setting would consume a lot of
memory for higher order matrices.

How then, should I go about memory pre-allocation more efficiently?

Thanks,
Gaurish Telang


There is a code in the PETSc folder (/src/mat/examples/tests/ex78.c ) which
reads in a matrix of this format.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/72d7080a/attachment.htm>

From jed at 59A2.org  Sun Feb 13 16:47:46 2011
From: jed at 59A2.org (Jed Brown)
Date: Sun, 13 Feb 2011 23:47:46 +0100
Subject: [petsc-users] Better way to pre-allocate memory for matrix
 being read in ???
In-Reply-To: <AANLkTin-XQ1Dgp=e8gxEMjy+wmyODz3uphjjeTa94awv@mail.gmail.com>
References: <AANLkTin-XQ1Dgp=e8gxEMjy+wmyODz3uphjjeTa94awv@mail.gmail.com>
Message-ID: <AANLkTi=0gsQKxaQwRinKc1ybwett5vf1qE0k4oZUvDgv@mail.gmail.com>

On Sun, Feb 13, 2011 at 23:37, Gaurish Telang <gaurish108 at gmail.com> wrote:

> I have a text file containing the non-zero entries of a sparse matrix of
> dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j,
> element] format.
>

This is a horrible format and can not scale. If it becomes a performance
issue, change the format.


>
> That is ALL the information I have regarding the matrix.
>
> However when pre-allocating memory with  MatSeqAIJSetPreallocation(Mat
> B,PetscInt nz,const PetscInt nnz[]),  the parameters nz and nnz need to be
> known,
>
> nz=number of nonzeros per row (same for all rows)
> nnz=array containing the number of nonzeros in the various rows (possibly
> different for each row) or PETSC_NULL<http://www.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-dev/docs/manualpages/Sys/PETSC_NULL.html#PETSC_NULL>
>
> which i do not know for my matrix, ( unless I resort to using MATLAB. ).
>
> Does that mean I have to set nz= 1274 (the length of a row) and
> nnz=PETSC_NULL ?
>

No, read the file twice. The first time through, just count the number of
nonzeros (per row), then set preallocation with the correct size, and
finally read the file a second time calling MatSetValue() for each entry.

For a matrix this small, you could just read it in without preallocating,
but that will get too expensive quickly if you increase the matrix size.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/3e8b77eb/attachment.htm>

From gaurish108 at gmail.com  Sun Feb 13 18:16:47 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Sun, 13 Feb 2011 19:16:47 -0500
Subject: [petsc-users] Question on LOCDIR
Message-ID: <AANLkTi=pCn+C1zJK2qdFi0iJhekgOBLsWAdeFbU5TaN5@mail.gmail.com>

Hi,

I notice that in the tutorial codes in $PETSC_DIR , the makefiles have the
LOCDIR variable defined at the top, to be the current working directory.
Whereas in the makefile chapter of the manual,

the given template makefile has no mention of the LOCDIR variable. So, what
is the significance of this variable?

My PETSc programs seem to compile fine without introducing it into my
makefile.

Gaurish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/984ec05b/attachment.htm>

From ckontzialis at lycos.com  Sun Feb 13 20:12:13 2011
From: ckontzialis at lycos.com (Kontsantinos Kontzialis)
Date: Mon, 14 Feb 2011 04:12:13 +0200
Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 30
In-Reply-To: <mailman.25.1297620018.14606.petsc-users@mcs.anl.gov>
References: <mailman.25.1297620018.14606.petsc-users@mcs.anl.gov>
Message-ID: <4D588F7D.2060209@lycos.com>

On 02/13/2011 08:00 PM, petsc-users-request at mcs.anl.gov wrote:
> Send petsc-users mailing list submissions to
> 	petsc-users at mcs.anl.gov
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://lists.mcs.anl.gov/mailman/listinfo/petsc-users
> or, via email, send a message with subject or body 'help' to
> 	petsc-users-request at mcs.anl.gov
>
> You can reach the person managing the list at
> 	petsc-users-owner at mcs.anl.gov
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of petsc-users digest..."
>
>
> Today's Topics:
>
>     1. Re:  Guidelines for solving the euler equations with an
>        implicit matrix free approach (Matthew Knepley)
>     2.  Reading vtk file in parallel and assigning to an array
>        (khalid ashraf)
>     3. Re:  Reading vtk file in parallel and assigning to an	array
>        (Matthew Knepley)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 12 Feb 2011 16:12:36 -0600
> From: Matthew Knepley<knepley at gmail.com>
> Subject: Re: [petsc-users] Guidelines for solving the euler equations
> 	with an implicit matrix free approach
> To: PETSc users list<petsc-users at mcs.anl.gov>
> Message-ID:
> 	<AANLkTimus8N2jUu7Bvu0TZe59dr4FTpoJFh+m0QQUN-V at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis<
> ckontzialis at lycos.com>  wrote:
>
>> Dear Petsc team,
>>
>>   I'm new in Petsc and I'm trying to solve the euler equations of fluid
>> dynamics
>> using an implicit matrix free approach with a spatial discontinues galerkin
>> discretization.
>> I need some directions about how can I solve the following system:
>>
>> (M/dt+dR/du)*DU=R
>>
>> where R denotes the residual of the system and dR/du the residual jacobian.
>> Please help.
>>
> Petsc provides linear algebra and nonlinear solvers. This is fine once you
> have discretized.
> It sounds like you will use DG:
>
>    a) on a structured or unstructured grid?
>
> The PETSc DA supports structured grids in any dimension. After this, you
> want to use the
> TS object to present your system. There are many examples in the TS, e.g.
> ex10 for
> radiation-diffusion or ex14 for hydrostatic ice flow.
>
> Once you have your problem producing the correct residual and Jacobian, we
> can talk
> about solvers.
>
>     Matt
>
>
>> Kostas
>>
Mat,

Thank you for your reply. I have done the discretization and the 
residual and jacobian are computes correctly. Furthermore, I managed to 
do some calculation using TS but with an explicit scheme. I need to work 
with an implicit time discretization and I have read in a quite few 
papers that they follow the matrix free approach.

Thank you,

Kostas

From knepley at gmail.com  Sun Feb 13 20:18:42 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 13 Feb 2011 20:18:42 -0600
Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 30
In-Reply-To: <4D588F7D.2060209@lycos.com>
References: <mailman.25.1297620018.14606.petsc-users@mcs.anl.gov>
	<4D588F7D.2060209@lycos.com>
Message-ID: <AANLkTikEmwG1n1aqBcqthQP0Y=0RSSa0njhgz6k=6-Hs@mail.gmail.com>

On Sun, Feb 13, 2011 at 8:12 PM, Kontsantinos Kontzialis <
ckontzialis at lycos.com> wrote:

> Message: 1
>> Date: Sat, 12 Feb 2011 16:12:36 -0600
>> From: Matthew Knepley<knepley at gmail.com>
>> Subject: Re: [petsc-users] Guidelines for solving the euler equations
>>        with an implicit matrix free approach
>> To: PETSc users list<petsc-users at mcs.anl.gov>
>> Message-ID:
>>        <AANLkTimus8N2jUu7Bvu0TZe59dr4FTpoJFh+m0QQUN-V at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis<
>> ckontzialis at lycos.com>  wrote:
>>
>>  Dear Petsc team,
>>>
>>>  I'm new in Petsc and I'm trying to solve the euler equations of fluid
>>> dynamics
>>> using an implicit matrix free approach with a spatial discontinues
>>> galerkin
>>> discretization.
>>> I need some directions about how can I solve the following system:
>>>
>>> (M/dt+dR/du)*DU=R
>>>
>>> where R denotes the residual of the system and dR/du the residual
>>> jacobian.
>>> Please help.
>>>
>>>  Petsc provides linear algebra and nonlinear solvers. This is fine once
>> you
>> have discretized.
>> It sounds like you will use DG:
>>
>>   a) on a structured or unstructured grid?
>>
>> The PETSc DA supports structured grids in any dimension. After this, you
>> want to use the
>> TS object to present your system. There are many examples in the TS, e.g.
>> ex10 for
>> radiation-diffusion or ex14 for hydrostatic ice flow.
>>
>> Once you have your problem producing the correct residual and Jacobian, we
>> can talk
>> about solvers.
>>
>>    Matt
>>
>>
>>  Kostas
>>>
>>>  Mat,
>
> Thank you for your reply. I have done the discretization and the residual
> and jacobian are computes correctly. Furthermore, I managed to do some
> calculation using TS but with an explicit scheme. I need to work with an
> implicit time discretization and I have read in a quite few papers that they
> follow the matrix free approach.
>

Once you plug your Residual and Jacobian into the TS, you can start to try
out different solvers. Is this working?

  Thanks,

     Matt


> Thank you,
>
> Kostas
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/1afd5811/attachment.htm>

From ckontzialis at lycos.com  Sun Feb 13 20:24:16 2011
From: ckontzialis at lycos.com (Kontsantinos Kontzialis)
Date: Mon, 14 Feb 2011 04:24:16 +0200
Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 30
In-Reply-To: <mailman.25.1297620018.14606.petsc-users@mcs.anl.gov>
References: <mailman.25.1297620018.14606.petsc-users@mcs.anl.gov>
Message-ID: <4D589250.4000607@lycos.com>

On 02/13/2011 08:00 PM, petsc-users-request at mcs.anl.gov wrote:
> Send petsc-users mailing list submissions to
> 	petsc-users at mcs.anl.gov
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://lists.mcs.anl.gov/mailman/listinfo/petsc-users
> or, via email, send a message with subject or body 'help' to
> 	petsc-users-request at mcs.anl.gov
>
> You can reach the person managing the list at
> 	petsc-users-owner at mcs.anl.gov
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of petsc-users digest..."
>
>
> Today's Topics:
>
>     1. Re:  Guidelines for solving the euler equations with an
>        implicit matrix free approach (Matthew Knepley)
>     2.  Reading vtk file in parallel and assigning to an array
>        (khalid ashraf)
>     3. Re:  Reading vtk file in parallel and assigning to an	array
>        (Matthew Knepley)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 12 Feb 2011 16:12:36 -0600
> From: Matthew Knepley<knepley at gmail.com>
> Subject: Re: [petsc-users] Guidelines for solving the euler equations
> 	with an implicit matrix free approach
> To: PETSc users list<petsc-users at mcs.anl.gov>
> Message-ID:
> 	<AANLkTimus8N2jUu7Bvu0TZe59dr4FTpoJFh+m0QQUN-V at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis<
> ckontzialis at lycos.com>  wrote:
>
>> Dear Petsc team,
>>
>>   I'm new in Petsc and I'm trying to solve the euler equations of fluid
>> dynamics
>> using an implicit matrix free approach with a spatial discontinues galerkin
>> discretization.
>> I need some directions about how can I solve the following system:
>>
>> (M/dt+dR/du)*DU=R
>>
>> where R denotes the residual of the system and dR/du the residual jacobian.
>> Please help.
>>
> Petsc provides linear algebra and nonlinear solvers. This is fine once you
> have discretized.
> It sounds like you will use DG:
>
>    a) on a structured or unstructured grid?
>
> The PETSc DA supports structured grids in any dimension. After this, you
> want to use the
> TS object to present your system. There are many examples in the TS, e.g.
> ex10 for
> radiation-diffusion or ex14 for hydrostatic ice flow.
>
> Once you have your problem producing the correct residual and Jacobian, we
> can talk
> about solvers.
>
>     Matt
>
>
>> Kostas
>>
Matt,

Also, I work on unstructured grids in 2d.

Kostas

From bsmith at mcs.anl.gov  Sun Feb 13 20:38:33 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 13 Feb 2011 20:38:33 -0600
Subject: [petsc-users] Question on LOCDIR
In-Reply-To: <AANLkTi=pCn+C1zJK2qdFi0iJhekgOBLsWAdeFbU5TaN5@mail.gmail.com>
References: <AANLkTi=pCn+C1zJK2qdFi0iJhekgOBLsWAdeFbU5TaN5@mail.gmail.com>
Message-ID: <B156146D-6380-499C-A3B1-0E029CF34299@mcs.anl.gov>


  LOCDIR is used when making manual pages and links to examples. It is not needed for running codes

  Barry

On Feb 13, 2011, at 6:16 PM, Gaurish Telang wrote:

> Hi,
> 
> I notice that in the tutorial codes in $PETSC_DIR , the makefiles have the LOCDIR variable defined at the top, to be the current working directory. Whereas in the makefile chapter of the manual, 
> 
> the given template makefile has no mention of the LOCDIR variable. So, what is the significance of this variable? 
> 
> My PETSc programs seem to compile fine without introducing it into my makefile. 
> 
> Gaurish 
> 


From ckontzialis at lycos.com  Sun Feb 13 21:09:51 2011
From: ckontzialis at lycos.com (Kontsantinos Kontzialis)
Date: Mon, 14 Feb 2011 05:09:51 +0200
Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 31
In-Reply-To: <mailman.1507.1297651119.10439.petsc-users@mcs.anl.gov>
References: <mailman.1507.1297651119.10439.petsc-users@mcs.anl.gov>
Message-ID: <4D589CFF.7000703@lycos.com>

On 02/14/2011 04:38 AM, petsc-users-request at mcs.anl.gov wrote:
> Send petsc-users mailing list submissions to
> 	petsc-users at mcs.anl.gov
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://lists.mcs.anl.gov/mailman/listinfo/petsc-users
> or, via email, send a message with subject or body 'help' to
> 	petsc-users-request at mcs.anl.gov
>
> You can reach the person managing the list at
> 	petsc-users-owner at mcs.anl.gov
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of petsc-users digest..."
>
>
> Today's Topics:
>
>     1.  Better way to pre-allocate memory for matrix being	read in
>        ??? (Gaurish Telang)
>     2. Re:  Better way to pre-allocate memory for matrix being read
>        in ??? (Jed Brown)
>     3.  Question on LOCDIR (Gaurish Telang)
>     4. Re:  petsc-users Digest, Vol 26, Issue 30
>        (Kontsantinos Kontzialis)
>     5. Re:  petsc-users Digest, Vol 26, Issue 30 (Matthew Knepley)
>     6. Re:  petsc-users Digest, Vol 26, Issue 30
>        (Kontsantinos Kontzialis)
>     7. Re:  Question on LOCDIR (Barry Smith)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 13 Feb 2011 17:37:11 -0500
> From: Gaurish Telang<gaurish108 at gmail.com>
> Subject: [petsc-users] Better way to pre-allocate memory for matrix
> 	being	read in ???
> To: petsc-users at mcs.anl.gov
> Message-ID:
> 	<AANLkTin-XQ1Dgp=e8gxEMjy+wmyODz3uphjjeTa94awv at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> I have a text file containing the non-zero entries of a sparse matrix of
> dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j,
> element] format.
>
> That is ALL the information I have regarding the matrix.
>
> However when pre-allocating memory with  MatSeqAIJSetPreallocation(Mat
> B,PetscInt nz,const PetscInt nnz[]),  the parameters nz and nnz need to be
> known,
>
> nz=number of nonzeros per row (same for all rows)
> nnz=array containing the number of nonzeros in the various rows (possibly
> different for each row) or
> PETSC_NULL<http://www.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-dev/docs/manualpages/Sys/PETSC_NULL.html#PETSC_NULL>
>
> which i do not know for my matrix, ( unless I resort to using MATLAB. ).
>
> Does that mean I have to set nz= 1274 (the length of a row) and
> nnz=PETSC_NULL ?  Though,  I guess this setting would consume a lot of
> memory for higher order matrices.
>
> How then, should I go about memory pre-allocation more efficiently?
>
> Thanks,
> Gaurish Telang
>
>
>
>
>
>
>
>
>
> There is a code in the PETSc folder (/src/mat/examples/tests/ex78.c ) which
> reads in a matrix of this format.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/72d7080a/attachment-0001.htm>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 13 Feb 2011 23:47:46 +0100
> From: Jed Brown<jed at 59A2.org>
> Subject: Re: [petsc-users] Better way to pre-allocate memory for
> 	matrix being read in ???
> To: PETSc users list<petsc-users at mcs.anl.gov>
> Message-ID:
> 	<AANLkTi=0gsQKxaQwRinKc1ybwett5vf1qE0k4oZUvDgv at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sun, Feb 13, 2011 at 23:37, Gaurish Telang<gaurish108 at gmail.com>  wrote:
>
>> I have a text file containing the non-zero entries of a sparse matrix of
>> dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j,
>> element] format.
>>
> This is a horrible format and can not scale. If it becomes a performance
> issue, change the format.
>
>
>> That is ALL the information I have regarding the matrix.
>>
>> However when pre-allocating memory with  MatSeqAIJSetPreallocation(Mat
>> B,PetscInt nz,const PetscInt nnz[]),  the parameters nz and nnz need to be
>> known,
>>
>> nz=number of nonzeros per row (same for all rows)
>> nnz=array containing the number of nonzeros in the various rows (possibly
>> different for each row) or PETSC_NULL<http://www.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-dev/docs/manualpages/Sys/PETSC_NULL.html#PETSC_NULL>
>>
>> which i do not know for my matrix, ( unless I resort to using MATLAB. ).
>>
>> Does that mean I have to set nz= 1274 (the length of a row) and
>> nnz=PETSC_NULL ?
>>
> No, read the file twice. The first time through, just count the number of
> nonzeros (per row), then set preallocation with the correct size, and
> finally read the file a second time calling MatSetValue() for each entry.
>
> For a matrix this small, you could just read it in without preallocating,
> but that will get too expensive quickly if you increase the matrix size.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/3e8b77eb/attachment-0001.htm>
>
> ------------------------------
>
> Message: 3
> Date: Sun, 13 Feb 2011 19:16:47 -0500
> From: Gaurish Telang<gaurish108 at gmail.com>
> Subject: [petsc-users] Question on LOCDIR
> To: petsc-users at mcs.anl.gov
> Message-ID:
> 	<AANLkTi=pCn+C1zJK2qdFi0iJhekgOBLsWAdeFbU5TaN5 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> I notice that in the tutorial codes in $PETSC_DIR , the makefiles have the
> LOCDIR variable defined at the top, to be the current working directory.
> Whereas in the makefile chapter of the manual,
>
> the given template makefile has no mention of the LOCDIR variable. So, what
> is the significance of this variable?
>
> My PETSc programs seem to compile fine without introducing it into my
> makefile.
>
> Gaurish
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110213/984ec05b/attachment-0001.htm>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 14 Feb 2011 04:12:13 +0200
> From: Kontsantinos Kontzialis<ckontzialis at lycos.com>
> Subject: Re: [petsc-users] petsc-users Digest, Vol 26, Issue 30
> To: petsc-users at mcs.anl.gov
> Message-ID:<4D588F7D.2060209 at lycos.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 02/13/2011 08:00 PM, petsc-users-request at mcs.anl.gov wrote:
>> Send petsc-users mailing list submissions to
>> 	petsc-users at mcs.anl.gov
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	https://lists.mcs.anl.gov/mailman/listinfo/petsc-users
>> or, via email, send a message with subject or body 'help' to
>> 	petsc-users-request at mcs.anl.gov
>>
>> You can reach the person managing the list at
>> 	petsc-users-owner at mcs.anl.gov
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of petsc-users digest..."
>>
>>
>> Today's Topics:
>>
>>      1. Re:  Guidelines for solving the euler equations with an
>>         implicit matrix free approach (Matthew Knepley)
>>      2.  Reading vtk file in parallel and assigning to an array
>>         (khalid ashraf)
>>      3. Re:  Reading vtk file in parallel and assigning to an	array
>>         (Matthew Knepley)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sat, 12 Feb 2011 16:12:36 -0600
>> From: Matthew Knepley<knepley at gmail.com>
>> Subject: Re: [petsc-users] Guidelines for solving the euler equations
>> 	with an implicit matrix free approach
>> To: PETSc users list<petsc-users at mcs.anl.gov>
>> Message-ID:
>> 	<AANLkTimus8N2jUu7Bvu0TZe59dr4FTpoJFh+m0QQUN-V at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis<
>> ckontzialis at lycos.com>   wrote:
>>
>>> Dear Petsc team,
>>>
>>>    I'm new in Petsc and I'm trying to solve the euler equations of fluid
>>> dynamics
>>> using an implicit matrix free approach with a spatial discontinues galerkin
>>> discretization.
>>> I need some directions about how can I solve the following system:
>>>
>>> (M/dt+dR/du)*DU=R
>>>
>>> where R denotes the residual of the system and dR/du the residual jacobian.
>>> Please help.
>>>
>> Petsc provides linear algebra and nonlinear solvers. This is fine once you
>> have discretized.
>> It sounds like you will use DG:
>>
>>     a) on a structured or unstructured grid?
>>
>> The PETSc DA supports structured grids in any dimension. After this, you
>> want to use the
>> TS object to present your system. There are many examples in the TS, e.g.
>> ex10 for
>> radiation-diffusion or ex14 for hydrostatic ice flow.
>>
>> Once you have your problem producing the correct residual and Jacobian, we
>> can talk
>> about solvers.
>>
>>      Matt
>>
>>
>>> Kostas
>>>
> Mat,
>
> Thank you for your reply. I have done the discretization and the
> residual and jacobian are computes correctly. Furthermore, I managed to
> do some calculation using TS but with an explicit scheme. I need to work
> with an implicit time discretization and I have read in a quite few
> papers that they follow the matrix free approach.
>
> Thank you,
>
> Kostas
>
>
> ------------------------------
>
> Message: 5
> Date: Sun, 13 Feb 2011 20:18:42 -0600
> From: Matthew Knepley<knepley at gmail.com>
> Subject: Re: [petsc-users] petsc-users Digest, Vol 26, Issue 30
> To: PETSc users list<petsc-users at mcs.anl.gov>
> Message-ID:
> 	<AANLkTikEmwG1n1aqBcqthQP0Y=0RSSa0njhgz6k=6-Hs at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Sun, Feb 13, 2011 at 8:12 PM, Kontsantinos Kontzialis<
> ckontzialis at lycos.com>  wrote:
>
>> Message: 1
>>> Date: Sat, 12 Feb 2011 16:12:36 -0600
>>> From: Matthew Knepley<knepley at gmail.com>
>>> Subject: Re: [petsc-users] Guidelines for solving the euler equations
>>>         with an implicit matrix free approach
>>> To: PETSc users list<petsc-users at mcs.anl.gov>
>>> Message-ID:
>>>         <AANLkTimus8N2jUu7Bvu0TZe59dr4FTpoJFh+m0QQUN-V at mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis<
>>> ckontzialis at lycos.com>   wrote:
>>>
>>>   Dear Petsc team,
>>>>   I'm new in Petsc and I'm trying to solve the euler equations of fluid
>>>> dynamics
>>>> using an implicit matrix free approach with a spatial discontinues
>>>> galerkin
>>>> discretization.
>>>> I need some directions about how can I solve the following system:
>>>>
>>>> (M/dt+dR/du)*DU=R
>>>>
>>>> where R denotes the residual of the system and dR/du the residual
>>>> jacobian.
>>>> Please help.
>>>>
>>>>   Petsc provides linear algebra and nonlinear solvers. This is fine once
>>> you
>>> have discretized.
>>> It sounds like you will use DG:
>>>
>>>    a) on a structured or unstructured grid?
>>>
>>> The PETSc DA supports structured grids in any dimension. After this, you
>>> want to use the
>>> TS object to present your system. There are many examples in the TS, e.g.
>>> ex10 for
>>> radiation-diffusion or ex14 for hydrostatic ice flow.
>>>
>>> Once you have your problem producing the correct residual and Jacobian, we
>>> can talk
>>> about solvers.
>>>
>>>     Matt
>>>
>>>
>>>   Kostas
>>>>   Mat,
>> Thank you for your reply. I have done the discretization and the residual
>> and jacobian are computes correctly. Furthermore, I managed to do some
>> calculation using TS but with an explicit scheme. I need to work with an
>> implicit time discretization and I have read in a quite few papers that they
>> follow the matrix free approach.
>>
> Once you plug your Residual and Jacobian into the TS, you can start to try
> out different solvers. Is this working?
>
>    Thanks,
>
>       Matt
>
>
>> Thank you,
>>
>> Kostas
>>
>
>
Matt,

here is a fragment from my code where I try to work on the implicit scheme

// Apply initial conditions
     ierr = initial_conditions(sys);
     CHKERRQ(ierr);

     ierr = TSCreate(sys.comm, &sys.ts);
     CHKERRQ(ierr);

     ierr = TSSetSolution(sys.ts, sys.gsv);
     CHKERRQ(ierr);

     ierr = TSSetFromOptions(sys.ts);
     CHKERRQ(ierr);

     ierr = TSSetProblemType(sys.ts, TS_NONLINEAR);
     CHKERRQ(ierr);

     ierr = TSSetType(sys.ts, TSBEULER);
     CHKERRQ(ierr);

     ierr = TSSetRHSFunction(sys.ts, base_residual, &sys);
     CHKERRQ(ierr);

     ierr = TSGetSNES(sys.ts, &sys.snes);
     CHKERRQ(ierr);

     ierr = SNESSetFromOptions(sys.snes);
     CHKERRQ(ierr);

     ierr = MatCreateSNESMF(sys.snes, &sys.J);
     CHKERRQ(ierr);

     ierr = TSSetRHSJacobian(sys.ts, sys.J, sys.J, jacobian_matrix, &sys);
     CHKERRQ(ierr);

     ierr = SNESGetKSP(sys.snes, &sys.ksp);
     CHKERRQ(ierr)

     ierr = MatScale(sys.M, 1.0 / sys.con->dt);
     CHKERRQ(ierr);

     ierr = MatAYPX(sys.J, -1.0, sys.M, DIFFERENT_NONZERO_PATTERN);
     CHKERRQ(ierr);

     ierr = KSPSetOperators(sys.ksp, sys.J, sys.J, SAME_NONZERO_PATTERN);
     CHKERRQ(ierr);

     sys.con->j = 0;
     sys.con->tm = 0;

     ierr = TSSetDuration(sys.ts, 10000, sys.con->etime);
     CHKERRQ(ierr);

     ierr = TSMonitorSet(sys.ts, monitor, &sys, PETSC_NULL);
     CHKERRQ(ierr);

     ierr = PetscMalloc (sys.ldof*sizeof (PetscScalar ),&sys.Lim);
     CHKERRQ(ierr);

     ierr = TSSetSolution(sys.ts, PETSC_NULL);
     CHKERRQ(ierr);

sys is my application context. Petsc tells me that I should call first 
the snessetfunction due to
MatCreateSNESMF, 'cause I want to use an MF approach. what can I do? on 
an explicit run I do
not use the jacobian, but how can I use it there now with a MF? Im confused.

Thank you,

kostas

From bsmith at mcs.anl.gov  Sun Feb 13 21:45:47 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 13 Feb 2011 21:45:47 -0600
Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 31
In-Reply-To: <4D589CFF.7000703@lycos.com>
References: <mailman.1507.1297651119.10439.petsc-users@mcs.anl.gov>
	<4D589CFF.7000703@lycos.com>
Message-ID: <D9652B66-805F-4312-BFFF-E772677A8B7E@mcs.anl.gov>


On Feb 13, 2011, at 9:09 PM, Kontsantinos Kontzialis wrote:

   Don't get the snes and ksp and do kspsetoperators() instead use TSSetRHSJacobian()

   Do not use MatCreateSNESMF() that is for working with the nonlinear solver use
MatCreateMFFD() you then need to call MatMFFDSetFunction() to give it the function it will provide the derivative of.

   You need to pass a function to TSSetRHSJacobian() that sets the MatMFFDSetBase() and then calls MatAssemblyBegin/End() 

  Barry


> 
> here is a fragment from my code where I try to work on the implicit scheme
> 
> // Apply initial conditions
>    ierr = initial_conditions(sys);
>    CHKERRQ(ierr);
> 
>    ierr = TSCreate(sys.comm, &sys.ts);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetSolution(sys.ts, sys.gsv);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetFromOptions(sys.ts);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetProblemType(sys.ts, TS_NONLINEAR);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetType(sys.ts, TSBEULER);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetRHSFunction(sys.ts, base_residual, &sys);
>    CHKERRQ(ierr);
> 
>    ierr = TSGetSNES(sys.ts, &sys.snes);
>    CHKERRQ(ierr);
> 
>    ierr = SNESSetFromOptions(sys.snes);
>    CHKERRQ(ierr);
> 
>    ierr = MatCreateSNESMF(sys.snes, &sys.J);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetRHSJacobian(sys.ts, sys.J, sys.J, jacobian_matrix, &sys);
>    CHKERRQ(ierr);
> 
>    ierr = SNESGetKSP(sys.snes, &sys.ksp);
>    CHKERRQ(ierr)
> 
>    ierr = MatScale(sys.M, 1.0 / sys.con->dt);
>    CHKERRQ(ierr);
> 
>    ierr = MatAYPX(sys.J, -1.0, sys.M, DIFFERENT_NONZERO_PATTERN);
>    CHKERRQ(ierr);
> 
>    ierr = KSPSetOperators(sys.ksp, sys.J, sys.J, SAME_NONZERO_PATTERN);
>    CHKERRQ(ierr);
> 
>    sys.con->j = 0;
>    sys.con->tm = 0;
> 
>    ierr = TSSetDuration(sys.ts, 10000, sys.con->etime);
>    CHKERRQ(ierr);
> 
>    ierr = TSMonitorSet(sys.ts, monitor, &sys, PETSC_NULL);
>    CHKERRQ(ierr);
> 
>    ierr = PetscMalloc (sys.ldof*sizeof (PetscScalar ),&sys.Lim);
>    CHKERRQ(ierr);
> 
>    ierr = TSSetSolution(sys.ts, PETSC_NULL);
>    CHKERRQ(ierr);
> 
> sys is my application context. Petsc tells me that I should call first the snessetfunction due to
> MatCreateSNESMF, 'cause I want to use an MF approach. what can I do? on an explicit run I do
> not use the jacobian, but how can I use it there now with a MF? Im confused.
> 
> Thank you,
> 
> kostas


From tomjan at jay.au.poznan.pl  Mon Feb 14 03:13:51 2011
From: tomjan at jay.au.poznan.pl (Tomasz Jankowski)
Date: Mon, 14 Feb 2011 10:13:51 +0100 (CET)
Subject: [petsc-users] query about parallel REML
Message-ID: <Pine.LNX.4.64.1102140956570.32089@jay.up.poznan.pl>

Hello All,

I'm looking for some opensource/free copy of parallel reml (best based on 
PETSC).

I have found old post at 
https://stat.ethz.ch/pipermail/r-help/2004-May/050436.html
which is directing to acre-developers at eml.pnl.gov.
But it seems that this email is not active.
I have also try with JM.Malard at pnl.gov email but it's also 
not active.

So I'm writing here. Does anyone have copy of this software?
Could you share it?

Many Thanks,

Tomasz Jankowski

########################################################
#               tomjan at jay.au.poznan.pl                #
#              jay.au.poznan.pl/~tomjan/               #
########################################################

From fernandez858 at gmail.com  Mon Feb 14 04:27:51 2011
From: fernandez858 at gmail.com (Michel Cancelliere)
Date: Mon, 14 Feb 2011 11:27:51 +0100
Subject: [petsc-users] Solver Parameter Optimization
Message-ID: <AANLkTimBUcN=D1sBYdVKKL7c3dE6Xm8kqdQ09OcJWHbC@mail.gmail.com>

Dear users,

I've implemented a simple hydrocarbon reservoir simulator using PETSc, the
simulator is used inside an iterative loop in which thousand of simulations
are run with different input parameters(In order to calibrate the properties
of the model). I would like to use those iterations to tuneup the parameters
of the solver (precoditioner,type of linear solver, restart, etc...), Have
someone working with that?, Do you know some papers where I can some
information about that?

Thank you for your time,

Michel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110214/0f01d814/attachment.htm>

From aron.ahmadia at kaust.edu.sa  Mon Feb 14 04:35:50 2011
From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia)
Date: Mon, 14 Feb 2011 13:35:50 +0300
Subject: [petsc-users] Solver Parameter Optimization
In-Reply-To: <AANLkTimBUcN=D1sBYdVKKL7c3dE6Xm8kqdQ09OcJWHbC@mail.gmail.com>
References: <AANLkTimBUcN=D1sBYdVKKL7c3dE6Xm8kqdQ09OcJWHbC@mail.gmail.com>
Message-ID: <AANLkTimhJF2wpXc+WMnYutsqoQqhx1+3oa9jc7ivb5gP@mail.gmail.com>

I've seen a few threads in this direction:

See Sanjukta Bhowmick's work on combining machine learning with PETSc to
start:
http://cs.unomaha.edu/~bhowmick/Blog/Entries/2010/9/12_Solvers_for_Large_Sparse_Linear_Systems.html

HYPRE has something along the lines of this as well, but I have not seen any
promising results.

Don't forget that even slightly different problems can have wildly different
convergence properties, you want a solver that is both fast and robust to
changes in your input parameters.

A

On Mon, Feb 14, 2011 at 1:27 PM, Michel Cancelliere
<fernandez858 at gmail.com>wrote:

> Dear users,
>
> I've implemented a simple hydrocarbon reservoir simulator using PETSc, the
> simulator is used inside an iterative loop in which thousand of simulations
> are run with different input parameters(In order to calibrate the properties
> of the model). I would like to use those iterations to tuneup the parameters
> of the solver (precoditioner,type of linear solver, restart, etc...), Have
> someone working with that?, Do you know some papers where I can some
> information about that?
>
> Thank you for your time,
>
> Michel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110214/d059a515/attachment.htm>

From fernandez858 at gmail.com  Mon Feb 14 04:40:19 2011
From: fernandez858 at gmail.com (Michel Cancelliere)
Date: Mon, 14 Feb 2011 11:40:19 +0100
Subject: [petsc-users] Solver Parameter Optimization
In-Reply-To: <AANLkTimhJF2wpXc+WMnYutsqoQqhx1+3oa9jc7ivb5gP@mail.gmail.com>
References: <AANLkTimBUcN=D1sBYdVKKL7c3dE6Xm8kqdQ09OcJWHbC@mail.gmail.com>
	<AANLkTimhJF2wpXc+WMnYutsqoQqhx1+3oa9jc7ivb5gP@mail.gmail.com>
Message-ID: <AANLkTinM7-nbkgtqOCqeo04Nf+Q6U1bx2BBq0-WWmDoo@mail.gmail.com>

Thank you Aron, I'll check it

On Mon, Feb 14, 2011 at 11:35 AM, Aron Ahmadia <aron.ahmadia at kaust.edu.sa>wrote:

> I've seen a few threads in this direction:
>
> See Sanjukta Bhowmick's work on combining machine learning with PETSc to
> start:
> http://cs.unomaha.edu/~bhowmick/Blog/Entries/2010/9/12_Solvers_for_Large_Sparse_Linear_Systems.html
>
> HYPRE has something along the lines of this as well, but I have not seen
> any promising results.
>
> Don't forget that even slightly different problems can have wildly
> different convergence properties, you want a solver that is both fast and
> robust to changes in your input parameters.
>
> A
>
>
> On Mon, Feb 14, 2011 at 1:27 PM, Michel Cancelliere <
> fernandez858 at gmail.com> wrote:
>
>> Dear users,
>>
>> I've implemented a simple hydrocarbon reservoir simulator using PETSc, the
>> simulator is used inside an iterative loop in which thousand of simulations
>> are run with different input parameters(In order to calibrate the properties
>> of the model). I would like to use those iterations to tuneup the parameters
>> of the solver (precoditioner,type of linear solver, restart, etc...), Have
>> someone working with that?, Do you know some papers where I can some
>> information about that?
>>
>> Thank you for your time,
>>
>> Michel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110214/499cac56/attachment-0001.htm>

From gaurish108 at gmail.com  Mon Feb 14 11:01:07 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Mon, 14 Feb 2011 12:01:07 -0500
Subject: [petsc-users] Trouble understanding -vec_view output.
Message-ID: <AANLkTimY7hYaCNd6uiGa7zwEZgFZgQVEGpSu1FFZ-Xqz@mail.gmail.com>

Hi,

I am having trouble understanding the -vec_view output of the simple code I
have pasted underneath. In it, I am just reading two PetscBinary files
created with a stand alone code. one containing a matrix and another
containing a vector.

However on doing -vec_view during run-time, I get a sequence of zeros before
the actual vector of the binary file is printed. But when I read the
PetscBinary file in MATLAB I get the correct vector.

Why does this happen? Is it because I am using vector type VECMPI to load
the binary file (* ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr); *)??

e.g.

The ASCII text file (BEFORE converting to binary) with standalone code looks
like

4
6

The output I get on -vec_view is

0
0
4
6

But with VecGetSize I get the vector length to be 2. !!!

Thank you,

Gaurish


%---------------------------------------------
Code:
int main(int argc,char **args)
{
  Mat                   A                ;
  Vec                   b                               ;
  PetscTruth            flg_A,flg_b                     ;
  PetscErrorCode        ierr                ;
  PetscInt              m,n,length                             ;
  char
file_A[PETSC_MAX_PATH_LEN],file_b[PETSC_MAX_PATH_LEN]      ;
  PetscViewer           fd_A, fd_b                      ;

  PetscInitialize(&argc,&args,(char *)0,help);

  /* Get the option typed from the terminal */
  ierr =
PetscOptionsGetString(PETSC_NULL,"-matrix",file_A,PETSC_MAX_PATH_LEN-1,&flg_A);CHKERRQ(ierr);
  if (!flg_A) SETERRQ(1,"Must indicate binary matrix matrix file with the
-matrix option");

  ierr =
PetscOptionsGetString(PETSC_NULL,"-vector",file_b,PETSC_MAX_PATH_LEN-1,&flg_b);CHKERRQ(ierr);
  if (!flg_b) SETERRQ(1,"Must indicate binary matrix matrix file with the
-vector option");

  /* Load the matrix and vector */
  ierr =
PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_A,FILE_MODE_READ,&fd_A);CHKERRQ(ierr);
  ierr = MatLoad(fd_A,MATMPIAIJ,&A);CHKERRQ(ierr);
  ierr = PetscViewerDestroy(fd_A);CHKERRQ(ierr);

  //ierr=MatView(A,PETSC_VIEWER_DRAW_WORLD);CHKERRQ(ierr);


  ierr =
PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_b,FILE_MODE_READ,&fd_b);CHKERRQ(ierr);
  ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr);
  ierr = PetscViewerDestroy(fd_b);CHKERRQ(ierr);

  /* Simple Cursory checks */
  ierr = MatGetSize(A,&m,&n);CHKERRQ(ierr);
  ierr=PetscPrintf(PETSC_COMM_WORLD,"\n %i %i \n",m,n);CHKERRQ(ierr);

  ierr=VecGetSize(b,&length);CHKERRQ(ierr);
  ierr=PetscPrintf(PETSC_COMM_WORLD,"%i \n",length);CHKERRQ(ierr);
  //ierr=VecView(b,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);


  /* Destroy Objects.  */
  MatDestroy(A);
  VecDestroy(b);

  ierr = PetscFinalize();CHKERRQ(ierr);

  sleep(4);

 return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110214/9cfe5586/attachment.htm>

From knepley at gmail.com  Mon Feb 14 11:11:23 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 14 Feb 2011 11:11:23 -0600
Subject: [petsc-users] Trouble understanding -vec_view output.
In-Reply-To: <AANLkTimY7hYaCNd6uiGa7zwEZgFZgQVEGpSu1FFZ-Xqz@mail.gmail.com>
References: <AANLkTimY7hYaCNd6uiGa7zwEZgFZgQVEGpSu1FFZ-Xqz@mail.gmail.com>
Message-ID: <AANLkTikyitWXXr0MX7ixhMgfEUxzsT95kFSeJZkifhq5@mail.gmail.com>

On Mon, Feb 14, 2011 at 11:01 AM, Gaurish Telang <gaurish108 at gmail.com>wrote:

> Hi,
>
> I am having trouble understanding the -vec_view output of the simple code I
> have pasted underneath. In it, I am just reading two PetscBinary files
> created with a stand alone code. one containing a matrix and another
> containing a vector.
>
> However on doing -vec_view during run-time, I get a sequence of zeros
> before the actual vector of the binary file is printed. But when I read the
> PetscBinary file in MATLAB I get the correct vector.
>
> Why does this happen? Is it because I am using vector type VECMPI to load
> the binary file (* ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr); *)??
>
> e.g.
>
> The ASCII text file (BEFORE converting to binary) with standalone code
> looks like
>
> 4
> 6
>
> The output I get on -vec_view is
>
> 0
> 0
> 4
> 6
>
> But with VecGetSize I get the vector length to be 2. !!!
>

Send the entire output and input files to petsc-maint at mcs.anl.gov. I
guarantee you that
this is just misunderstanding, but its impossible to see exactly what you
are doing from
this sample. For instance, in parallel -vec_view will print Process [k].

   Matt


> Thank you,
>
> Gaurish
>
>
> %---------------------------------------------
> Code:
> int main(int argc,char **args)
> {
>   Mat                   A                ;
>   Vec                   b                               ;
>   PetscTruth            flg_A,flg_b                     ;
>   PetscErrorCode        ierr                ;
>   PetscInt              m,n,length                             ;
>   char
> file_A[PETSC_MAX_PATH_LEN],file_b[PETSC_MAX_PATH_LEN]      ;
>   PetscViewer           fd_A, fd_b                      ;
>
>   PetscInitialize(&argc,&args,(char *)0,help);
>
>   /* Get the option typed from the terminal */
>   ierr =
> PetscOptionsGetString(PETSC_NULL,"-matrix",file_A,PETSC_MAX_PATH_LEN-1,&flg_A);CHKERRQ(ierr);
>   if (!flg_A) SETERRQ(1,"Must indicate binary matrix matrix file with the
> -matrix option");
>
>   ierr =
> PetscOptionsGetString(PETSC_NULL,"-vector",file_b,PETSC_MAX_PATH_LEN-1,&flg_b);CHKERRQ(ierr);
>   if (!flg_b) SETERRQ(1,"Must indicate binary matrix matrix file with the
> -vector option");
>
>   /* Load the matrix and vector */
>   ierr =
> PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_A,FILE_MODE_READ,&fd_A);CHKERRQ(ierr);
>   ierr = MatLoad(fd_A,MATMPIAIJ,&A);CHKERRQ(ierr);
>   ierr = PetscViewerDestroy(fd_A);CHKERRQ(ierr);
>
>   //ierr=MatView(A,PETSC_VIEWER_DRAW_WORLD);CHKERRQ(ierr);
>
>
>   ierr =
> PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_b,FILE_MODE_READ,&fd_b);CHKERRQ(ierr);
>   ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr);
>   ierr = PetscViewerDestroy(fd_b);CHKERRQ(ierr);
>
>   /* Simple Cursory checks */
>   ierr = MatGetSize(A,&m,&n);CHKERRQ(ierr);
>   ierr=PetscPrintf(PETSC_COMM_WORLD,"\n %i %i \n",m,n);CHKERRQ(ierr);
>
>   ierr=VecGetSize(b,&length);CHKERRQ(ierr);
>   ierr=PetscPrintf(PETSC_COMM_WORLD,"%i \n",length);CHKERRQ(ierr);
>   //ierr=VecView(b,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>
>
>
>   /* Destroy Objects.  */
>   MatDestroy(A);
>   VecDestroy(b);
>
>   ierr = PetscFinalize();CHKERRQ(ierr);
>
>   sleep(4);
>
>  return 0;
> }
>
>
>
>
>
>
>
>
>
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110214/c1451f39/attachment.htm>

From jed at 59A2.org  Mon Feb 14 11:13:29 2011
From: jed at 59A2.org (Jed Brown)
Date: Mon, 14 Feb 2011 18:13:29 +0100
Subject: [petsc-users] Trouble understanding -vec_view output.
In-Reply-To: <AANLkTimY7hYaCNd6uiGa7zwEZgFZgQVEGpSu1FFZ-Xqz@mail.gmail.com>
References: <AANLkTimY7hYaCNd6uiGa7zwEZgFZgQVEGpSu1FFZ-Xqz@mail.gmail.com>
Message-ID: <AANLkTinN2r3MnHzyGWua1GT6io1v2hDuFw46YHRry_m=@mail.gmail.com>

On Mon, Feb 14, 2011 at 18:01, Gaurish Telang <gaurish108 at gmail.com> wrote:

> 0
> 0
> 4
> 6
>
> But with VecGetSize I get the vector length to be 2. !!!
>

Looks like the vector is printed twice, once when it was created and once
after meaningful values were actually loaded. Does this still happen with
the different loading model in petsc-dev?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110214/59bcbfcb/attachment.htm>

From w_subber at yahoo.com  Tue Feb 15 14:53:03 2011
From: w_subber at yahoo.com (Waad Subber)
Date: Tue, 15 Feb 2011 12:53:03 -0800 (PST)
Subject: [petsc-users] Create a matrix from a set of vectors
Message-ID: <911695.8147.qm@web38204.mail.mud.yahoo.com>

Hello,
Can I create a matrix from a set of vectors without using VecGetValues and MatSetValues such as

Vec? v1, v2
Mat? A

A=[v1 v2]

Thanks
Waad


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110215/f9152086/attachment.htm>

From knepley at gmail.com  Tue Feb 15 15:12:59 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 15 Feb 2011 15:12:59 -0600
Subject: [petsc-users] Create a matrix from a set of vectors
In-Reply-To: <911695.8147.qm@web38204.mail.mud.yahoo.com>
References: <911695.8147.qm@web38204.mail.mud.yahoo.com>
Message-ID: <AANLkTincvx3kv9CMUYM6JTXGPuSJwNNZp7WbsN0oFHg=@mail.gmail.com>

On Tue, Feb 15, 2011 at 2:53 PM, Waad Subber <w_subber at yahoo.com> wrote:

> Hello,
> Can I create a matrix from a set of vectors without using VecGetValues and
> MatSetValues such as
>
> Vec  v1, v2
> Mat  A
>
> A=[v1 v2]
>

This is a dense matrix. You can create a MatDense and pull out arrays to
shared.

   Matt


> Thanks
> Waad
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110215/cdd58406/attachment.htm>

From bjha7333 at yahoo.com  Tue Feb 15 18:56:26 2011
From: bjha7333 at yahoo.com (Birendra jha)
Date: Tue, 15 Feb 2011 16:56:26 -0800 (PST)
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
Message-ID: <903282.93478.qm@web120515.mail.ne1.yahoo.com>

Dear Petsc users,

I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.

bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
Running test examples to verify correct installation
--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
/usr/bin/ld: cannot find -licudata
collect2: ld returned 1 exit status
make[3]: [ex19] Error 1 (ignored)
/bin/rm -f ex19.o
--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
/usr/bin/ld: cannot find -licudata
collect2: ld returned 1 exit status
make[3]: [ex5f] Error 1 (ignored)
/bin/rm -f ex5f.o
Completed test examples


It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues
for some time now. So it should be that, I suppose.

I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test:

  PetscMatlabEngine e;
  PetscScalar *array; array[0]=0;
  const char name[]="a";
  PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
  PetscMatlabEnginePutArray(e,1,1,array,name);
  PetscMatlabEngineGetArray(e,1,1,array,name);
  PetscMatlabEngineDestroy(e);

Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file? Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine.

But, I get runtime error for mexPrintf:

bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg
Traceback (most recent call last):
  File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in <module>
    from pylith.apps.PyLithApp import PyLithApp
  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in <module>
    from PetscApplication import PetscApplication
  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in <module>
    class PetscApplication(Application):
  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication
    from pylith.utils.PetscManager import PetscManager
  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in <module>
    import pylith.utils.petsc as petsc
  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in <module>
    _petsc = swig_import_helper()
  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper
    _mod = imp.load_module('_petsc', fp, pathname, description)
ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf


Can anyone help/suggest something?

Thanks a lot
Bir


From bsmith at mcs.anl.gov  Tue Feb 15 19:56:02 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 15 Feb 2011 19:56:02 -0600
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
In-Reply-To: <903282.93478.qm@web120515.mail.ne1.yahoo.com>
References: <903282.93478.qm@web120515.mail.ne1.yahoo.com>
Message-ID: <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov>


On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:

> Dear Petsc users,
> 
> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.

  Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output 
also run file  /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 

> 
> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
> Running test examples to verify correct installation
> --------------Error detected during compile or link!-----------------------
> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
> /usr/bin/ld: cannot find -licudata
> collect2: ld returned 1 exit status
> make[3]: [ex19] Error 1 (ignored)
> /bin/rm -f ex19.o
> --------------Error detected during compile or link!-----------------------
> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
> mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
> /usr/bin/ld: cannot find -licudata
> collect2: ld returned 1 exit status
> make[3]: [ex5f] Error 1 (ignored)
> /bin/rm -f ex5f.o
> Completed test examples
> 
> 
> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues
> for some time now. So it should be that, I suppose.
> 
> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test:
> 
>  PetscMatlabEngine e;
>  PetscScalar *array; array[0]=0;
>  const char name[]="a";
>  PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
>  PetscMatlabEnginePutArray(e,1,1,array,name);
>  PetscMatlabEngineGetArray(e,1,1,array,name);
>  PetscMatlabEngineDestroy(e);
> 
> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file?

  No

> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine.
> 
> But, I get runtime error for mexPrintf:
> 
> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg
> Traceback (most recent call last):
>  File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in <module>
>    from pylith.apps.PyLithApp import PyLithApp
>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in <module>
>    from PetscApplication import PetscApplication
>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in <module>
>    class PetscApplication(Application):
>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication
>    from pylith.utils.PetscManager import PetscManager
>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in <module>
>    import pylith.utils.petsc as petsc
>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in <module>
>    _petsc = swig_import_helper()
>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper
>    _mod = imp.load_module('_petsc', fp, pathname, description)
> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf

  Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded.  I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols?

   Barry

> 
> 
> Can anyone help/suggest something?
> 
> Thanks a lot
> Bir
> 
> 
> 


From bjha7333 at yahoo.com  Tue Feb 15 20:14:11 2011
From: bjha7333 at yahoo.com (Birendra jha)
Date: Tue, 15 Feb 2011 18:14:11 -0800 (PST)
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
Message-ID: <787767.48972.qm@web120505.mail.ne1.yahoo.com>

Hi,

I attached the output of ls -l. Below are the outputs of "file" command:

bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
/home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42: symbolic link to `libicudata.so.42.1'

bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1
/home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped


Thanks
Bir

--- On Wed, 2/16/11, Barry Smith <bsmith at mcs.anl.gov> wrote:

> From: Barry Smith <bsmith at mcs.anl.gov>
> Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf
> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> Date: Wednesday, February 16, 2011, 7:26 AM
> 
> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
> 
> > Dear Petsc users,
> > 
> > I am getting "cannot find -licudata" error during
> "make test" on petsc-dev, even when libicudata.so.42.1, and
> its link, libicudata.so.42 are in
> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.
> 
> ? Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and
> send the output 
> also run file?
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 
> 
> > 
> > bjha at ubuntu:~/src/petsc-dev$ make
> PETSC_DIR=/home/bjha/src/petsc-dev
> PETSC_ARCH=linux_gcc-4.4.1_64 test
> > Running test examples to verify correct installation
> > --------------Error detected during compile or
> link!-----------------------
> > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> > mpicxx -o ex19.o -c -Wall -Wwrite-strings
> -Wno-strict-aliasing -Wno-unknown-pragmas -g
> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
> -I/home/bjha/src/petsc-dev/include
> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
> -I/home/bjha/src/petsc-dev/include/sieve
> -I/home/bjha/MATLAB/R2010b/extern/include
> -I/home/bjha/tools/gcc-4.4.1_64/include
> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
> > mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -g???-o ex19?
> ex19.o
> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib?
> -lpetsc
> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
> -lparmetis -lmetis
> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
> -L/home/bjha/MATLAB/R2010b/bin/glnx86
> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex
> -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco
> -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas
> -L/home/bjha/tools/gcc-4.4.1_64/lib
> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal
> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77
> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx
> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil
> -lgcc_s -lpthread -ldl 
> > /usr/bin/ld: cannot find -licudata
> > collect2: ld returned 1 exit status
> > make[3]: [ex19] Error 1 (ignored)
> > /bin/rm -f ex19.o
> > --------------Error detected during compile or
> link!-----------------------
> > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> > mpif90 -c? -Wall -Wno-unused-variable
> -g???-I/home/bjha/src/petsc-dev/include
> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
> -I/home/bjha/src/petsc-dev/include/sieve
> -I/home/bjha/MATLAB/R2010b/extern/include
> -I/home/bjha/tools/gcc-4.4.1_64/include? ? -o
> ex5f.o ex5f.F
> > mpif90 -Wall -Wno-unused-variable
> -g???-o ex5f ex5f.o
> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib?
> -lpetsc
> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
> -lparmetis -lmetis
> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
> -L/home/bjha/MATLAB/R2010b/bin/glnx86
> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex
> -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco
> -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas
> -L/home/bjha/tools/gcc-4.4.1_64/lib
> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal
> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77
> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx
> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil
> -lgcc_s -lpthread -ldl 
> > /usr/bin/ld: cannot find -licudata
> > collect2: ld returned 1 exit status
> > make[3]: [ex5f] Error 1 (ignored)
> > /bin/rm -f ex5f.o
> > Completed test examples
> > 
> > 
> > It is correct that I didn't install matlab in the
> default directory (/usr/local) because I had some permission
> issues on Ubuntu. But I have been running matlab (by running
> /MATLAB/R2010b/bin/matlab.sh) without any issues
> > for some time now. So it should be that, I suppose.
> > 
> > I still went ahead with compiling my application with
> few lines of PetscMatlabEngine functions, just to test:
> > 
> >? PetscMatlabEngine e;
> >? PetscScalar *array; array[0]=0;
> >? const char name[]="a";
> >?
> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
> >? PetscMatlabEnginePutArray(e,1,1,array,name);
> >? PetscMatlabEngineGetArray(e,1,1,array,name);
> >? PetscMatlabEngineDestroy(e);
> > 
> > Do I need to include any header file (e.g.
> petscmatlab.h) in the header of my class file?
> 
> ? No
> 
> > Right now, the application compiled (make, make
> install) fine without any such include file. The application
> have been using Petsc for its solver without any issues, so
> it includes all the necessary files. I just want to exten
> the application to call some matlab scripts by using
> PetscMatlabEngine.
> > 
> > But, I get runtime error for mexPrintf:
> > 
> > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith
> step06_pres.cfg
> > Traceback (most recent call last):
> >? File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith",
> line 37, in <module>
> >? ? from pylith.apps.PyLithApp import
> PyLithApp
> >? File
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py",
> line 23, in <module>
> >? ? from PetscApplication import
> PetscApplication
> >? File
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
> line 27, in <module>
> >? ? class PetscApplication(Application):
> >? File
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
> line 41, in PetscApplication
> >? ? from pylith.utils.PetscManager import
> PetscManager
> >? File
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py",
> line 29, in <module>
> >? ? import pylith.utils.petsc as petsc
> >? File
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
> line 25, in <module>
> >? ? _petsc = swig_import_helper()
> >? File
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
> line 21, in swig_import_helper
> >? ? _mod = imp.load_module('_petsc', fp,
> pathname, description)
> > ImportError:
> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined
> symbol: mexPrintf
> 
> ? Somehow all the Matlab shared libraries need to be
> found when python loads libpylith.so is loaded.? I
> don't know how this is done in Linux. It is really a python
> question if you want to use a dynamic library in python that
> uses another shared library how do you make sure python gets
> all the shared libraries loaded to resolve the symbols?
> 
> ???Barry
> 
> > 
> > 
> > Can anyone help/suggest something?
> > 
> > Thanks a lot
> > Bir
> > 
> > 
> > 
> 
>


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MATLAB_R2010b_bin_glnx86_files.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110215/404b8157/attachment-0001.txt>

From bsmith at mcs.anl.gov  Tue Feb 15 20:25:11 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 15 Feb 2011 20:25:11 -0600
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
In-Reply-To: <787767.48972.qm@web120505.mail.ne1.yahoo.com>
References: <787767.48972.qm@web120505.mail.ne1.yahoo.com>
Message-ID: <426D3650-BA63-48A7-A8ED-09D9AB96D8E3@mcs.anl.gov>


On Feb 15, 2011, at 8:14 PM, Birendra jha wrote:

> Hi,
> 
> I attached the output of ls -l. Below are the outputs of "file" command:
> 
> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42: symbolic link to `libicudata.so.42.1'
> 
> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped

   Humm. Run file on some of the other libraries in that directory.  Are they also ELF 32 bit? You can try editing ${PETSC_ARCH/conf/petscvariables and removing the reference to libicudata then run make test. It may not be needed.

  Barry

> 
> 
> Thanks
> Bir
> 
> --- On Wed, 2/16/11, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>> From: Barry Smith <bsmith at mcs.anl.gov>
>> Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf
>> To: "PETSc users list" <petsc-users at mcs.anl.gov>
>> Date: Wednesday, February 16, 2011, 7:26 AM
>> 
>> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
>> 
>>> Dear Petsc users,
>>> 
>>> I am getting "cannot find -licudata" error during
>> "make test" on petsc-dev, even when libicudata.so.42.1, and
>> its link, libicudata.so.42 are in
>> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.
>> 
>>   Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and
>> send the output 
>> also run file 
>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 
>> 
>>> 
>>> bjha at ubuntu:~/src/petsc-dev$ make
>> PETSC_DIR=/home/bjha/src/petsc-dev
>> PETSC_ARCH=linux_gcc-4.4.1_64 test
>>> Running test examples to verify correct installation
>>> --------------Error detected during compile or
>> link!-----------------------
>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>> mpicxx -o ex19.o -c -Wall -Wwrite-strings
>> -Wno-strict-aliasing -Wno-unknown-pragmas -g
>> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
>> -I/home/bjha/src/petsc-dev/include
>> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
>> -I/home/bjha/src/petsc-dev/include/sieve
>> -I/home/bjha/MATLAB/R2010b/extern/include
>> -I/home/bjha/tools/gcc-4.4.1_64/include
>> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
>>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing
>> -Wno-unknown-pragmas -g   -o ex19 
>> ex19.o
>> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib 
>> -lpetsc
>> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
>> -lparmetis -lmetis
>> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
>> -L/home/bjha/MATLAB/R2010b/bin/glnx86
>> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex
>> -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco
>> -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas
>> -L/home/bjha/tools/gcc-4.4.1_64/lib
>> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
>> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal
>> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77
>> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx
>> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil
>> -lgcc_s -lpthread -ldl 
>>> /usr/bin/ld: cannot find -licudata
>>> collect2: ld returned 1 exit status
>>> make[3]: [ex19] Error 1 (ignored)
>>> /bin/rm -f ex19.o
>>> --------------Error detected during compile or
>> link!-----------------------
>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>> mpif90 -c  -Wall -Wno-unused-variable
>> -g   -I/home/bjha/src/petsc-dev/include
>> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
>> -I/home/bjha/src/petsc-dev/include/sieve
>> -I/home/bjha/MATLAB/R2010b/extern/include
>> -I/home/bjha/tools/gcc-4.4.1_64/include    -o
>> ex5f.o ex5f.F
>>> mpif90 -Wall -Wno-unused-variable
>> -g   -o ex5f ex5f.o
>> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib 
>> -lpetsc
>> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
>> -lparmetis -lmetis
>> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
>> -L/home/bjha/MATLAB/R2010b/bin/glnx86
>> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex
>> -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco
>> -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas
>> -L/home/bjha/tools/gcc-4.4.1_64/lib
>> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
>> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal
>> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77
>> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx
>> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil
>> -lgcc_s -lpthread -ldl 
>>> /usr/bin/ld: cannot find -licudata
>>> collect2: ld returned 1 exit status
>>> make[3]: [ex5f] Error 1 (ignored)
>>> /bin/rm -f ex5f.o
>>> Completed test examples
>>> 
>>> 
>>> It is correct that I didn't install matlab in the
>> default directory (/usr/local) because I had some permission
>> issues on Ubuntu. But I have been running matlab (by running
>> /MATLAB/R2010b/bin/matlab.sh) without any issues
>>> for some time now. So it should be that, I suppose.
>>> 
>>> I still went ahead with compiling my application with
>> few lines of PetscMatlabEngine functions, just to test:
>>> 
>>>   PetscMatlabEngine e;
>>>   PetscScalar *array; array[0]=0;
>>>   const char name[]="a";
>>>  
>> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
>>>   PetscMatlabEnginePutArray(e,1,1,array,name);
>>>   PetscMatlabEngineGetArray(e,1,1,array,name);
>>>   PetscMatlabEngineDestroy(e);
>>> 
>>> Do I need to include any header file (e.g.
>> petscmatlab.h) in the header of my class file?
>> 
>>   No
>> 
>>> Right now, the application compiled (make, make
>> install) fine without any such include file. The application
>> have been using Petsc for its solver without any issues, so
>> it includes all the necessary files. I just want to exten
>> the application to call some matlab scripts by using
>> PetscMatlabEngine.
>>> 
>>> But, I get runtime error for mexPrintf:
>>> 
>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith
>> step06_pres.cfg
>>> Traceback (most recent call last):
>>>   File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith",
>> line 37, in <module>
>>>     from pylith.apps.PyLithApp import
>> PyLithApp
>>>   File
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py",
>> line 23, in <module>
>>>     from PetscApplication import
>> PetscApplication
>>>   File
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
>> line 27, in <module>
>>>     class PetscApplication(Application):
>>>   File
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
>> line 41, in PetscApplication
>>>     from pylith.utils.PetscManager import
>> PetscManager
>>>   File
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py",
>> line 29, in <module>
>>>     import pylith.utils.petsc as petsc
>>>   File
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
>> line 25, in <module>
>>>     _petsc = swig_import_helper()
>>>   File
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
>> line 21, in swig_import_helper
>>>     _mod = imp.load_module('_petsc', fp,
>> pathname, description)
>>> ImportError:
>> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined
>> symbol: mexPrintf
>> 
>>   Somehow all the Matlab shared libraries need to be
>> found when python loads libpylith.so is loaded.  I
>> don't know how this is done in Linux. It is really a python
>> question if you want to use a dynamic library in python that
>> uses another shared library how do you make sure python gets
>> all the shared libraries loaded to resolve the symbols?
>> 
>>    Barry
>> 
>>> 
>>> 
>>> Can anyone help/suggest something?
>>> 
>>> Thanks a lot
>>> Bir
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> <MATLAB_R2010b_bin_glnx86_files.txt>


From baagaard at usgs.gov  Tue Feb 15 20:44:43 2011
From: baagaard at usgs.gov (Brad Aagaard)
Date: Tue, 15 Feb 2011 18:44:43 -0800
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
 undefined symbol mexPrintf
In-Reply-To: <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov>
References: <903282.93478.qm@web120515.mail.ne1.yahoo.com>
	<73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov>
Message-ID: <4D5B3A1B.3040607@usgs.gov>

Birendra-

PyLith extracts the link flags from PETSc during the PyLith configure, 
so any libraries PETSc uses should automatically get linked into the 
PyLith Python modules and libraries. Only if these libraries end up in 
some unusual PETSc make related variable would they be missed in the 
PyLith linking. You can also run "ldd libpylith.so" in the directory 
containing libpylith.so to make sure it is finding the shared libraries.

Brad


On 2/15/11 5:56 PM, Barry Smith wrote:
>
> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
>
>> Dear Petsc users,
>>
>> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.
>
>    Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output
> also run file  /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
>
>>
>> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
>> Running test examples to verify correct installation
>> --------------Error detected during compile or link!-----------------------
>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>> /usr/bin/ld: cannot find -licudata
>> collect2: ld returned 1 exit status
>> make[3]: [ex19] Error 1 (ignored)
>> /bin/rm -f ex19.o
>> --------------Error detected during compile or link!-----------------------
>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>> mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
>> mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>> /usr/bin/ld: cannot find -licudata
>> collect2: ld returned 1 exit status
>> make[3]: [ex5f] Error 1 (ignored)
>> /bin/rm -f ex5f.o
>> Completed test examples
>>
>>
>> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues
>> for some time now. So it should be that, I suppose.
>>
>> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test:
>>
>>   PetscMatlabEngine e;
>>   PetscScalar *array; array[0]=0;
>>   const char name[]="a";
>>   PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
>>   PetscMatlabEnginePutArray(e,1,1,array,name);
>>   PetscMatlabEngineGetArray(e,1,1,array,name);
>>   PetscMatlabEngineDestroy(e);
>>
>> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file?
>
>    No
>
>> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine.
>>
>> But, I get runtime error for mexPrintf:
>>
>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg
>> Traceback (most recent call last):
>>   File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in<module>
>>     from pylith.apps.PyLithApp import PyLithApp
>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in<module>
>>     from PetscApplication import PetscApplication
>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in<module>
>>     class PetscApplication(Application):
>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication
>>     from pylith.utils.PetscManager import PetscManager
>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in<module>
>>     import pylith.utils.petsc as petsc
>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in<module>
>>     _petsc = swig_import_helper()
>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper
>>     _mod = imp.load_module('_petsc', fp, pathname, description)
>> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf
>
>    Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded.  I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols?
>
>     Barry
>
>>
>>
>> Can anyone help/suggest something?
>>
>> Thanks a lot
>> Bir
>>
>>
>>
>
>


From bsmith at mcs.anl.gov  Tue Feb 15 21:00:30 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 15 Feb 2011 21:00:30 -0600
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
In-Reply-To: <4D5B3A1B.3040607@usgs.gov>
References: <903282.93478.qm@web120515.mail.ne1.yahoo.com>
	<73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov>
	<4D5B3A1B.3040607@usgs.gov>
Message-ID: <7A4CE949-A06D-499B-8EE3-9269AFA4EFC6@mcs.anl.gov>


   The MATLAB libraries are listed in the $PETSC_ARCH/conf/petscvariables file in the variable PETSC_EXTERNAL_LIB_BASIC  This was changed recently in PETSc-dev perhaps the pylith configure has not yet been updated to handle this.

   Barry

On Feb 15, 2011, at 8:44 PM, Brad Aagaard wrote:

> Birendra-
> 
> PyLith extracts the link flags from PETSc during the PyLith configure, so any libraries PETSc uses should automatically get linked into the PyLith Python modules and libraries. Only if these libraries end up in some unusual PETSc make related variable would they be missed in the PyLith linking. You can also run "ldd libpylith.so" in the directory containing libpylith.so to make sure it is finding the shared libraries.
> 
> Brad
> 
> 
> On 2/15/11 5:56 PM, Barry Smith wrote:
>> 
>> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
>> 
>>> Dear Petsc users,
>>> 
>>> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.
>> 
>>   Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output
>> also run file  /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
>> 
>>> 
>>> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
>>> Running test examples to verify correct installation
>>> --------------Error detected during compile or link!-----------------------
>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
>>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>>> /usr/bin/ld: cannot find -licudata
>>> collect2: ld returned 1 exit status
>>> make[3]: [ex19] Error 1 (ignored)
>>> /bin/rm -f ex19.o
>>> --------------Error detected during compile or link!-----------------------
>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>> mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
>>> mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>>> /usr/bin/ld: cannot find -licudata
>>> collect2: ld returned 1 exit status
>>> make[3]: [ex5f] Error 1 (ignored)
>>> /bin/rm -f ex5f.o
>>> Completed test examples
>>> 
>>> 
>>> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues
>>> for some time now. So it should be that, I suppose.
>>> 
>>> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test:
>>> 
>>>  PetscMatlabEngine e;
>>>  PetscScalar *array; array[0]=0;
>>>  const char name[]="a";
>>>  PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
>>>  PetscMatlabEnginePutArray(e,1,1,array,name);
>>>  PetscMatlabEngineGetArray(e,1,1,array,name);
>>>  PetscMatlabEngineDestroy(e);
>>> 
>>> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file?
>> 
>>   No
>> 
>>> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine.
>>> 
>>> But, I get runtime error for mexPrintf:
>>> 
>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg
>>> Traceback (most recent call last):
>>>  File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in<module>
>>>    from pylith.apps.PyLithApp import PyLithApp
>>>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in<module>
>>>    from PetscApplication import PetscApplication
>>>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in<module>
>>>    class PetscApplication(Application):
>>>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication
>>>    from pylith.utils.PetscManager import PetscManager
>>>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in<module>
>>>    import pylith.utils.petsc as petsc
>>>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in<module>
>>>    _petsc = swig_import_helper()
>>>  File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper
>>>    _mod = imp.load_module('_petsc', fp, pathname, description)
>>> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf
>> 
>>   Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded.  I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols?
>> 
>>    Barry
>> 
>>> 
>>> 
>>> Can anyone help/suggest something?
>>> 
>>> Thanks a lot
>>> Bir
>>> 
>>> 
>>> 
>> 
>> 
> 


From baagaard at usgs.gov  Tue Feb 15 21:51:36 2011
From: baagaard at usgs.gov (Brad Aagaard)
Date: Tue, 15 Feb 2011 19:51:36 -0800
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
 undefined symbol mexPrintf
In-Reply-To: <7A4CE949-A06D-499B-8EE3-9269AFA4EFC6@mcs.anl.gov>
References: <903282.93478.qm@web120515.mail.ne1.yahoo.com>	<73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov>	<4D5B3A1B.3040607@usgs.gov>
	<7A4CE949-A06D-499B-8EE3-9269AFA4EFC6@mcs.anl.gov>
Message-ID: <4D5B49C8.10102@usgs.gov>

Barry-

In my current petsc-dev $PETSC_ARCH/conf/petscvariables, I have

PETSC_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB}

Shouldn't PETSC_LIB include all the libraries and paths we need for linking?

Thanks,
Brad


On 2/15/11 7:00 PM, Barry Smith wrote:
>
>     The MATLAB libraries are listed in the $PETSC_ARCH/conf/petscvariables file in the variable PETSC_EXTERNAL_LIB_BASIC  This was changed recently in PETSc-dev perhaps the pylith configure has not yet been updated to handle this.
>
>     Barry
>
> On Feb 15, 2011, at 8:44 PM, Brad Aagaard wrote:
>
>> Birendra-
>>
>> PyLith extracts the link flags from PETSc during the PyLith configure, so any libraries PETSc uses should automatically get linked into the PyLith Python modules and libraries. Only if these libraries end up in some unusual PETSc make related variable would they be missed in the PyLith linking. You can also run "ldd libpylith.so" in the directory containing libpylith.so to make sure it is finding the shared libraries.
>>
>> Brad
>>
>>
>> On 2/15/11 5:56 PM, Barry Smith wrote:
>>>
>>> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
>>>
>>>> Dear Petsc users,
>>>>
>>>> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine.
>>>
>>>    Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output
>>> also run file  /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
>>>
>>>>
>>>> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
>>>> Running test examples to verify correct installation
>>>> --------------Error detected during compile or link!-----------------------
>>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>>> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
>>>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>>>> /usr/bin/ld: cannot find -licudata
>>>> collect2: ld returned 1 exit status
>>>> make[3]: [ex19] Error 1 (ignored)
>>>> /bin/rm -f ex19.o
>>>> --------------Error detected during compile or link!-----------------------
>>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>>> mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
>>>> mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>>>> /usr/bin/ld: cannot find -licudata
>>>> collect2: ld returned 1 exit status
>>>> make[3]: [ex5f] Error 1 (ignored)
>>>> /bin/rm -f ex5f.o
>>>> Completed test examples
>>>>
>>>>
>>>> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues
>>>> for some time now. So it should be that, I suppose.
>>>>
>>>> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test:
>>>>
>>>>   PetscMatlabEngine e;
>>>>   PetscScalar *array; array[0]=0;
>>>>   const char name[]="a";
>>>>   PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
>>>>   PetscMatlabEnginePutArray(e,1,1,array,name);
>>>>   PetscMatlabEngineGetArray(e,1,1,array,name);
>>>>   PetscMatlabEngineDestroy(e);
>>>>
>>>> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file?
>>>
>>>    No
>>>
>>>> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine.
>>>>
>>>> But, I get runtime error for mexPrintf:
>>>>
>>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg
>>>> Traceback (most recent call last):
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in<module>
>>>>     from pylith.apps.PyLithApp import PyLithApp
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in<module>
>>>>     from PetscApplication import PetscApplication
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in<module>
>>>>     class PetscApplication(Application):
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication
>>>>     from pylith.utils.PetscManager import PetscManager
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in<module>
>>>>     import pylith.utils.petsc as petsc
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in<module>
>>>>     _petsc = swig_import_helper()
>>>>   File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper
>>>>     _mod = imp.load_module('_petsc', fp, pathname, description)
>>>> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf
>>>
>>>    Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded.  I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols?
>>>
>>>     Barry
>>>
>>>>
>>>>
>>>> Can anyone help/suggest something?
>>>>
>>>> Thanks a lot
>>>> Bir
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>


From bjha7333 at yahoo.com  Wed Feb 16 01:35:00 2011
From: bjha7333 at yahoo.com (Birendra jha)
Date: Tue, 15 Feb 2011 23:35:00 -0800 (PST)
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
In-Reply-To: <426D3650-BA63-48A7-A8ED-09D9AB96D8E3@mcs.anl.gov>
Message-ID: <856558.58586.qm@web120507.mail.ne1.yahoo.com>

Hi,

I removed -licudata at three places in petscvariables. The error just shifted to -licui18n:

bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
Running test examples to verify correct installation
--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
/usr/bin/ld: cannot find -licui18n
collect2: ld returned 1 exit status
make[3]: [ex19] Error 1 (ignored)
/bin/rm -f ex19.o
--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
/usr/bin/ld: cannot find -licui18n
collect2: ld returned 1 exit status
make[3]: [ex5f] Error 1 (ignored)
/bin/rm -f ex5f.o
Completed test examples


I removed its references, error shifted to -licuuc, removed its references, error became:
...
/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib/libpetsc.a(aij.o): In function `MatCreate_SeqAIJ':
/home/bjha/src/petsc-dev/src/mat/impls/aij/seq/aij.c:3492: undefined reference to `MatGetFactor_seqaij_matlab'
...

It seems a linking issue.

I checked "file" for few other libraries--they are all ELF 32-bit:

bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwblas.so
libmwblas.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped
bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libjogl.so 
libjogl.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwfftw.so
libmwfftw.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped
bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libeng.so
libeng.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped

PyLith configure also gives Petsc linking error:

checking for PETSc dir... /home/bjha/src/petsc-dev
checking for PETSc arch... linux_gcc-4.4.1_64
checking for PETSc config... /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/conf/petscvariables
checking for PETSc version == 3.1.0... yes
checking for PetscInitialize... no
checking for the libraries used by mpicc...  -pthread -L/home/bjha/tools/gcc-4.4.1_64/lib -lmpi -lopen-rte -lopen-pal -ldl -lnsl -lutil -lm -ldl
checking for PetscInitialize... no
configure: error: cannot link against PETSc libraries

Please help.

Thanks & regards
Bir

--- On Wed, 2/16/11, Barry Smith <bsmith at mcs.anl.gov> wrote:

> From: Barry Smith <bsmith at mcs.anl.gov>
> Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf
> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> Date: Wednesday, February 16, 2011, 7:55 AM
> 
> On Feb 15, 2011, at 8:14 PM, Birendra jha wrote:
> 
> > Hi,
> > 
> > I attached the output of ls -l. Below are the outputs
> of "file" command:
> > 
> > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
> > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42:
> symbolic link to `libicudata.so.42.1'
> > 
> > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1
> >
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF
> 32-bit LSB shared object, Intel 80386, version 1 (SYSV),
> dynamically linked, not stripped
> 
> ???Humm. Run file on some of the other
> libraries in that directory.? Are they also ELF 32 bit?
> You can try editing ${PETSC_ARCH/conf/petscvariables and
> removing the reference to libicudata then run make test. It
> may not be needed.
> 
> ? Barry
> 
> > 
> > 
> > Thanks
> > Bir
> > 
> > --- On Wed, 2/16/11, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > 
> >> From: Barry Smith <bsmith at mcs.anl.gov>
> >> Subject: Re: [petsc-users] Petscmatlabengine,
> libicudata not found, undefined symbol mexPrintf
> >> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> >> Date: Wednesday, February 16, 2011, 7:26 AM
> >> 
> >> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
> >> 
> >>> Dear Petsc users,
> >>> 
> >>> I am getting "cannot find -licudata" error
> during
> >> "make test" on petsc-dev, even when
> libicudata.so.42.1, and
> >> its link, libicudata.so.42 are in
> >> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make"
> was fine.
> >> 
> >>???Run ls -l
> /home/bjha/MATLAB/R2010b/bin/glnx86 and
> >> send the output 
> >> also run file 
> >>
> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 
> >> 
> >>> 
> >>> bjha at ubuntu:~/src/petsc-dev$ make
> >> PETSC_DIR=/home/bjha/src/petsc-dev
> >> PETSC_ARCH=linux_gcc-4.4.1_64 test
> >>> Running test examples to verify correct
> installation
> >>> --------------Error detected during compile
> or
> >> link!-----------------------
> >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> >>> mpicxx -o ex19.o -c -Wall -Wwrite-strings
> >> -Wno-strict-aliasing -Wno-unknown-pragmas -g
> >> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
> >> -I/home/bjha/src/petsc-dev/include
> >>
> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
> >> -I/home/bjha/src/petsc-dev/include/sieve
> >> -I/home/bjha/MATLAB/R2010b/extern/include
> >> -I/home/bjha/tools/gcc-4.4.1_64/include
> >> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
> >>> mpicxx -Wall -Wwrite-strings
> -Wno-strict-aliasing
> >> -Wno-unknown-pragmas -g???-o ex19 
> >> ex19.o
> >> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
> 
> >> -lpetsc
> >>
> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
> >> -lparmetis -lmetis
> >>
> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
> >> -L/home/bjha/MATLAB/R2010b/bin/glnx86
> >> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng
> -lmex
> >> -lmx -lmat -lut -licudata -licui18n -licuuc -lml
> -lchaco
> >> -L/usr/lib/atlas -llapack_atlas -llapack -latlas
> -lblas
> >> -L/home/bjha/tools/gcc-4.4.1_64/lib
> >> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
> >> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte
> -lopen-pal
> >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90
> -lmpi_f77
> >> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++
> -lmpi_cxx
> >> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl
> -lutil
> >> -lgcc_s -lpthread -ldl 
> >>> /usr/bin/ld: cannot find -licudata
> >>> collect2: ld returned 1 exit status
> >>> make[3]: [ex19] Error 1 (ignored)
> >>> /bin/rm -f ex19.o
> >>> --------------Error detected during compile
> or
> >> link!-----------------------
> >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> >>> mpif90 -c? -Wall -Wno-unused-variable
> >>
> -g???-I/home/bjha/src/petsc-dev/include
> >>
> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
> >> -I/home/bjha/src/petsc-dev/include/sieve
> >> -I/home/bjha/MATLAB/R2010b/extern/include
> >> -I/home/bjha/tools/gcc-4.4.1_64/include?
> ? -o
> >> ex5f.o ex5f.F
> >>> mpif90 -Wall -Wno-unused-variable
> >> -g???-o ex5f ex5f.o
> >> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
> 
> >> -lpetsc
> >>
> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
> >> -lparmetis -lmetis
> >>
> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
> >> -L/home/bjha/MATLAB/R2010b/bin/glnx86
> >> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng
> -lmex
> >> -lmx -lmat -lut -licudata -licui18n -licuuc -lml
> -lchaco
> >> -L/usr/lib/atlas -llapack_atlas -llapack -latlas
> -lblas
> >> -L/home/bjha/tools/gcc-4.4.1_64/lib
> >> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
> >> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte
> -lopen-pal
> >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90
> -lmpi_f77
> >> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++
> -lmpi_cxx
> >> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl
> -lutil
> >> -lgcc_s -lpthread -ldl 
> >>> /usr/bin/ld: cannot find -licudata
> >>> collect2: ld returned 1 exit status
> >>> make[3]: [ex5f] Error 1 (ignored)
> >>> /bin/rm -f ex5f.o
> >>> Completed test examples
> >>> 
> >>> 
> >>> It is correct that I didn't install matlab in
> the
> >> default directory (/usr/local) because I had some
> permission
> >> issues on Ubuntu. But I have been running matlab
> (by running
> >> /MATLAB/R2010b/bin/matlab.sh) without any issues
> >>> for some time now. So it should be that, I
> suppose.
> >>> 
> >>> I still went ahead with compiling my
> application with
> >> few lines of PetscMatlabEngine functions, just to
> test:
> >>> 
> >>>???PetscMatlabEngine e;
> >>>???PetscScalar *array;
> array[0]=0;
> >>>???const char name[]="a";
> >>>? 
> >>
> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
> >>>???PetscMatlabEnginePutArray(e,1,1,array,name);
> >>>???PetscMatlabEngineGetArray(e,1,1,array,name);
> >>>???PetscMatlabEngineDestroy(e);
> >>> 
> >>> Do I need to include any header file (e.g.
> >> petscmatlab.h) in the header of my class file?
> >> 
> >>???No
> >> 
> >>> Right now, the application compiled (make,
> make
> >> install) fine without any such include file. The
> application
> >> have been using Petsc for its solver without any
> issues, so
> >> it includes all the necessary files. I just want
> to exten
> >> the application to call some matlab scripts by
> using
> >> PetscMatlabEngine.
> >>> 
> >>> But, I get runtime error for mexPrintf:
> >>> 
> >>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$
> pylith
> >> step06_pres.cfg
> >>> Traceback (most recent call last):
> >>>???File
> "/home/bjha/tools/gcc-4.4.1_64/bin/pylith",
> >> line 37, in <module>
> >>>? ???from
> pylith.apps.PyLithApp import
> >> PyLithApp
> >>>???File
> >>
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py",
> >> line 23, in <module>
> >>>? ???from PetscApplication
> import
> >> PetscApplication
> >>>???File
> >>
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
> >> line 27, in <module>
> >>>? ???class
> PetscApplication(Application):
> >>>???File
> >>
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
> >> line 41, in PetscApplication
> >>>? ???from
> pylith.utils.PetscManager import
> >> PetscManager
> >>>???File
> >>
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py",
> >> line 29, in <module>
> >>>? ???import
> pylith.utils.petsc as petsc
> >>>???File
> >>
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
> >> line 25, in <module>
> >>>? ???_petsc =
> swig_import_helper()
> >>>???File
> >>
> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
> >> line 21, in swig_import_helper
> >>>? ???_mod =
> imp.load_module('_petsc', fp,
> >> pathname, description)
> >>> ImportError:
> >> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0:
> undefined
> >> symbol: mexPrintf
> >> 
> >>???Somehow all the Matlab shared
> libraries need to be
> >> found when python loads libpylith.so is
> loaded.? I
> >> don't know how this is done in Linux. It is really
> a python
> >> question if you want to use a dynamic library in
> python that
> >> uses another shared library how do you make sure
> python gets
> >> all the shared libraries loaded to resolve the
> symbols?
> >> 
> >>? ? Barry
> >> 
> >>> 
> >>> 
> >>> Can anyone help/suggest something?
> >>> 
> >>> Thanks a lot
> >>> Bir
> >>> 
> >>> 
> >>> 
> >> 
> >> 
> > 
> > 
> > <MATLAB_R2010b_bin_glnx86_files.txt>
> 
> 


From bsmith at mcs.anl.gov  Wed Feb 16 07:57:42 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 16 Feb 2011 07:57:42 -0600
Subject: [petsc-users] Petscmatlabengine, libicudata not found,
	undefined symbol mexPrintf
In-Reply-To: <856558.58586.qm@web120507.mail.ne1.yahoo.com>
References: <856558.58586.qm@web120507.mail.ne1.yahoo.com>
Message-ID: <B7854B7B-754D-4A4E-A40E-38E91E36234E@mcs.anl.gov>


On Feb 16, 2011, at 1:35 AM, Birendra jha wrote:

> Hi,
> 
> I removed -licudata at three places in petscvariables. The error just shifted to -licui18n:
> 
> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test
> Running test examples to verify correct installation
> --------------Error detected during compile or link!-----------------------
> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g   -o ex19  ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
> /usr/bin/ld: cannot find -licui18n
> collect2: ld returned 1 exit status
> make[3]: [ex19] Error 1 (ignored)
> /bin/rm -f ex19.o
> --------------Error detected during compile or link!-----------------------
> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
> mpif90 -c  -Wall -Wno-unused-variable -g   -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include    -o ex5f.o ex5f.F
> mpif90 -Wall -Wno-unused-variable -g   -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib  -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl 
> /usr/bin/ld: cannot find -licui18n
> collect2: ld returned 1 exit status
> make[3]: [ex5f] Error 1 (ignored)
> /bin/rm -f ex5f.o
> Completed test examples
> 
> 
> I removed its references, error shifted to -licuuc, removed its references, error became:
> ...
> /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib/libpetsc.a(aij.o): In function `MatCreate_SeqAIJ':
> /home/bjha/src/petsc-dev/src/mat/impls/aij/seq/aij.c:3492: undefined reference to `MatGetFactor_seqaij_matlab'

   Send configure.log and make.log to petsc-maint at mcs.anl.gov  Looks like those two libraries are not needed, but something went wrong with the PETSc compile.

   Barry

> ...
> 
> It seems a linking issue.
> 
> I checked "file" for few other libraries--they are all ELF 32-bit:
> 
> bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwblas.so
> libmwblas.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped
> bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libjogl.so 
> libjogl.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
> bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwfftw.so
> libmwfftw.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped
> bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libeng.so
> libeng.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped
> 
> PyLith configure also gives Petsc linking error:
> 
> checking for PETSc dir... /home/bjha/src/petsc-dev
> checking for PETSc arch... linux_gcc-4.4.1_64
> checking for PETSc config... /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/conf/petscvariables
> checking for PETSc version == 3.1.0... yes
> checking for PetscInitialize... no
> checking for the libraries used by mpicc...  -pthread -L/home/bjha/tools/gcc-4.4.1_64/lib -lmpi -lopen-rte -lopen-pal -ldl -lnsl -lutil -lm -ldl
> checking for PetscInitialize... no
> configure: error: cannot link against PETSc libraries
> 
> Please help.
> 
> Thanks & regards
> Bir
> 
> --- On Wed, 2/16/11, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>> From: Barry Smith <bsmith at mcs.anl.gov>
>> Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf
>> To: "PETSc users list" <petsc-users at mcs.anl.gov>
>> Date: Wednesday, February 16, 2011, 7:55 AM
>> 
>> On Feb 15, 2011, at 8:14 PM, Birendra jha wrote:
>> 
>>> Hi,
>>> 
>>> I attached the output of ls -l. Below are the outputs
>> of "file" command:
>>> 
>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file
>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42
>>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42:
>> symbolic link to `libicudata.so.42.1'
>>> 
>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file
>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1
>>> 
>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF
>> 32-bit LSB shared object, Intel 80386, version 1 (SYSV),
>> dynamically linked, not stripped
>> 
>>    Humm. Run file on some of the other
>> libraries in that directory.  Are they also ELF 32 bit?
>> You can try editing ${PETSC_ARCH/conf/petscvariables and
>> removing the reference to libicudata then run make test. It
>> may not be needed.
>> 
>>   Barry
>> 
>>> 
>>> 
>>> Thanks
>>> Bir
>>> 
>>> --- On Wed, 2/16/11, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>>> 
>>>> From: Barry Smith <bsmith at mcs.anl.gov>
>>>> Subject: Re: [petsc-users] Petscmatlabengine,
>> libicudata not found, undefined symbol mexPrintf
>>>> To: "PETSc users list" <petsc-users at mcs.anl.gov>
>>>> Date: Wednesday, February 16, 2011, 7:26 AM
>>>> 
>>>> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote:
>>>> 
>>>>> Dear Petsc users,
>>>>> 
>>>>> I am getting "cannot find -licudata" error
>> during
>>>> "make test" on petsc-dev, even when
>> libicudata.so.42.1, and
>>>> its link, libicudata.so.42 are in
>>>> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make"
>> was fine.
>>>> 
>>>>    Run ls -l
>> /home/bjha/MATLAB/R2010b/bin/glnx86 and
>>>> send the output 
>>>> also run file 
>>>> 
>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 
>>>> 
>>>>> 
>>>>> bjha at ubuntu:~/src/petsc-dev$ make
>>>> PETSC_DIR=/home/bjha/src/petsc-dev
>>>> PETSC_ARCH=linux_gcc-4.4.1_64 test
>>>>> Running test examples to verify correct
>> installation
>>>>> --------------Error detected during compile
>> or
>>>> link!-----------------------
>>>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>>>> mpicxx -o ex19.o -c -Wall -Wwrite-strings
>>>> -Wno-strict-aliasing -Wno-unknown-pragmas -g
>>>> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
>>>> -I/home/bjha/src/petsc-dev/include
>>>> 
>> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
>>>> -I/home/bjha/src/petsc-dev/include/sieve
>>>> -I/home/bjha/MATLAB/R2010b/extern/include
>>>> -I/home/bjha/tools/gcc-4.4.1_64/include
>>>> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c
>>>>> mpicxx -Wall -Wwrite-strings
>> -Wno-strict-aliasing
>>>> -Wno-unknown-pragmas -g   -o ex19 
>>>> ex19.o
>>>> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
>> 
>>>> -lpetsc
>>>> 
>> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
>>>> -lparmetis -lmetis
>>>> 
>> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
>>>> -L/home/bjha/MATLAB/R2010b/bin/glnx86
>>>> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng
>> -lmex
>>>> -lmx -lmat -lut -licudata -licui18n -licuuc -lml
>> -lchaco
>>>> -L/usr/lib/atlas -llapack_atlas -llapack -latlas
>> -lblas
>>>> -L/home/bjha/tools/gcc-4.4.1_64/lib
>>>> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
>>>> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte
>> -lopen-pal
>>>> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90
>> -lmpi_f77
>>>> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++
>> -lmpi_cxx
>>>> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl
>> -lutil
>>>> -lgcc_s -lpthread -ldl 
>>>>> /usr/bin/ld: cannot find -licudata
>>>>> collect2: ld returned 1 exit status
>>>>> make[3]: [ex19] Error 1 (ignored)
>>>>> /bin/rm -f ex19.o
>>>>> --------------Error detected during compile
>> or
>>>> link!-----------------------
>>>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html
>>>>> mpif90 -c  -Wall -Wno-unused-variable
>>>> 
>> -g   -I/home/bjha/src/petsc-dev/include
>>>> 
>> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include
>>>> -I/home/bjha/src/petsc-dev/include/sieve
>>>> -I/home/bjha/MATLAB/R2010b/extern/include
>>>> -I/home/bjha/tools/gcc-4.4.1_64/include 
>>   -o
>>>> ex5f.o ex5f.F
>>>>> mpif90 -Wall -Wno-unused-variable
>>>> -g   -o ex5f ex5f.o
>>>> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
>> 
>>>> -lpetsc
>>>> 
>> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib
>>>> -lparmetis -lmetis
>>>> 
>> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86
>>>> -L/home/bjha/MATLAB/R2010b/bin/glnx86
>>>> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng
>> -lmex
>>>> -lmx -lmat -lut -licudata -licui18n -licuuc -lml
>> -lchaco
>>>> -L/usr/lib/atlas -llapack_atlas -llapack -latlas
>> -lblas
>>>> -L/home/bjha/tools/gcc-4.4.1_64/lib
>>>> -L/usr/lib/gcc/i486-linux-gnu/4.4.3
>>>> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte
>> -lopen-pal
>>>> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90
>> -lmpi_f77
>>>> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++
>> -lmpi_cxx
>>>> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl
>> -lutil
>>>> -lgcc_s -lpthread -ldl 
>>>>> /usr/bin/ld: cannot find -licudata
>>>>> collect2: ld returned 1 exit status
>>>>> make[3]: [ex5f] Error 1 (ignored)
>>>>> /bin/rm -f ex5f.o
>>>>> Completed test examples
>>>>> 
>>>>> 
>>>>> It is correct that I didn't install matlab in
>> the
>>>> default directory (/usr/local) because I had some
>> permission
>>>> issues on Ubuntu. But I have been running matlab
>> (by running
>>>> /MATLAB/R2010b/bin/matlab.sh) without any issues
>>>>> for some time now. So it should be that, I
>> suppose.
>>>>> 
>>>>> I still went ahead with compiling my
>> application with
>>>> few lines of PetscMatlabEngine functions, just to
>> test:
>>>>> 
>>>>>    PetscMatlabEngine e;
>>>>>    PetscScalar *array;
>> array[0]=0;
>>>>>    const char name[]="a";
>>>>>   
>>>> 
>> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e);
>>>>>    PetscMatlabEnginePutArray(e,1,1,array,name);
>>>>>    PetscMatlabEngineGetArray(e,1,1,array,name);
>>>>>    PetscMatlabEngineDestroy(e);
>>>>> 
>>>>> Do I need to include any header file (e.g.
>>>> petscmatlab.h) in the header of my class file?
>>>> 
>>>>    No
>>>> 
>>>>> Right now, the application compiled (make,
>> make
>>>> install) fine without any such include file. The
>> application
>>>> have been using Petsc for its solver without any
>> issues, so
>>>> it includes all the necessary files. I just want
>> to exten
>>>> the application to call some matlab scripts by
>> using
>>>> PetscMatlabEngine.
>>>>> 
>>>>> But, I get runtime error for mexPrintf:
>>>>> 
>>>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$
>> pylith
>>>> step06_pres.cfg
>>>>> Traceback (most recent call last):
>>>>>    File
>> "/home/bjha/tools/gcc-4.4.1_64/bin/pylith",
>>>> line 37, in <module>
>>>>>      from
>> pylith.apps.PyLithApp import
>>>> PyLithApp
>>>>>    File
>>>> 
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py",
>>>> line 23, in <module>
>>>>>      from PetscApplication
>> import
>>>> PetscApplication
>>>>>    File
>>>> 
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
>>>> line 27, in <module>
>>>>>      class
>> PetscApplication(Application):
>>>>>    File
>>>> 
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py",
>>>> line 41, in PetscApplication
>>>>>      from
>> pylith.utils.PetscManager import
>>>> PetscManager
>>>>>    File
>>>> 
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py",
>>>> line 29, in <module>
>>>>>      import
>> pylith.utils.petsc as petsc
>>>>>    File
>>>> 
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
>>>> line 25, in <module>
>>>>>      _petsc =
>> swig_import_helper()
>>>>>    File
>>>> 
>> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py",
>>>> line 21, in swig_import_helper
>>>>>      _mod =
>> imp.load_module('_petsc', fp,
>>>> pathname, description)
>>>>> ImportError:
>>>> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0:
>> undefined
>>>> symbol: mexPrintf
>>>> 
>>>>    Somehow all the Matlab shared
>> libraries need to be
>>>> found when python loads libpylith.so is
>> loaded.  I
>>>> don't know how this is done in Linux. It is really
>> a python
>>>> question if you want to use a dynamic library in
>> python that
>>>> uses another shared library how do you make sure
>> python gets
>>>> all the shared libraries loaded to resolve the
>> symbols?
>>>> 
>>>>     Barry
>>>> 
>>>>> 
>>>>> 
>>>>> Can anyone help/suggest something?
>>>>> 
>>>>> Thanks a lot
>>>>> Bir
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> <MATLAB_R2010b_bin_glnx86_files.txt>
>> 
>> 
> 
> 
> 


From juhaj at iki.fi  Wed Feb 16 08:17:07 2011
From: juhaj at iki.fi (Juha =?iso-8859-1?q?J=E4ykk=E4?=)
Date: Wed, 16 Feb 2011 14:17:07 +0000
Subject: [petsc-users] KSPBuildSolution
Message-ID: <201102161417.09649.juhaj@iki.fi>

Hi all!

I have a problem with KSPBuildSolution. Either there is something wrong with 
KSPBuildSolution or my with understanding of it. Most likely the latter. I 
would think from the docs that KSPBuildSolution gives (in its last parameter) 
the current estimate of the solution and that at the last iteration before the 
whole thing converges, it would be THE solution.

However, this is not true: what KSPBuildSolution gives me is close to zero 
everywhere (including my right boundary value which is supposed to be 1!) even 
though the eventual solution returned by SNESSolve() is correct. What exactly 
is KSPBuildSolution giving me or is it buggy?

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/36e42e7e/attachment.pgp>

From jed at 59A2.org  Wed Feb 16 08:23:42 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 15:23:42 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161417.09649.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
Message-ID: <AANLkTi=nFOBT+-sihQkTfkiyJpzA+-J10ROivHyJbfet@mail.gmail.com>

On Wed, Feb 16, 2011 at 15:17, Juha J?ykk? <juhaj at iki.fi> wrote:

> even
> though the eventual solution returned by SNESSolve() is correct.
>

SNESSolve uses a Newton method so the linear system is being solving for a
defect. If the initial guess is zero, then it would normally pick up your
Dirichlet boundary conditions on the first iteration and all subsequent
solves would have zero in those locations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/9fc90077/attachment-0001.htm>

From juhaj at iki.fi  Wed Feb 16 09:18:43 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Wed, 16 Feb 2011 15:18:43 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTi=nFOBT+-sihQkTfkiyJpzA+-J10ROivHyJbfet@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi>
	<AANLkTi=nFOBT+-sihQkTfkiyJpzA+-J10ROivHyJbfet@mail.gmail.com>
Message-ID: <201102161518.47281.juhaj@iki.fi>

> SNESSolve uses a Newton method so the linear system is being solving for a

So the "solution" in the KSP should actually be identically zero for a 
converged result?

> defect. If the initial guess is zero, then it would normally pick up your
> Dirichlet boundary conditions on the first iteration and all subsequent
> solves would have zero in those locations.

It is not zero initially. But, on the other hand, it has zeros at both ends 
even at the very first iteration. If I understood your reply correctly, this 
would be expected for an initial guess which has correct values at the 
boundaries and it should only pick them up on the first iteration if they were 
not correct to begin with.

Having ruled out a possibility of a bug in KSP, I need to continue my hunt for 
DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make any 
difference, it always diverges. The funny thing is, it diverges even if I 
start with an *exact* *solution*... 

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/6a83d9a6/attachment.pgp>

From matijakecman at gmail.com  Wed Feb 16 05:54:10 2011
From: matijakecman at gmail.com (Matija Kecman)
Date: Wed, 16 Feb 2011 11:54:10 +0000
Subject: [petsc-users] Additive Schwarz Method output variable with
	processor number
Message-ID: <AANLkTikyFaQUitfKrxAdt_rhLWgBjaCsmJz7BdYNRvfy@mail.gmail.com>

Dear?Petsc users,

I am new to Petsc and I have been compiling and running some of the
examples. I have been investigating the Additive Schwarz Method
example (ksp/ksp/examples/tutorials/ex8.c) using the 'Basic method'
i.e. by setting the overlap and using the default PETSc decomposition.
I was investigating the effect of using multiple processors using the
following bash script (-n1, -n2 are the mesh dimensions in the x- and
y-directions, -overlap specifies the overlap for the PCASMSetOverlap()
routine):

for proc in 1 2 3 4; do
mpirun -np $proc ex8 -machinesfile machinesfile -n1 500 -n2 500
-overlap 2 -pc_asm_blocks 4 -ksp_monitor_true_residual -sub_ksp_type
preonly -sub_pc_type lu > ./log_$proc.dat
done

After cleaning up the log files and plotting log ( ||Ae||/||Ax|| )
with iteration number I generated the attached figure. I am wondering
why the number of iterations for convergence depends on the number of
processors used??According to the FAQ:

'The convergence of many of the preconditioners in PETSc including the
the default parallel preconditioner block Jacobi depends on the number
of processes. The more processes the (slightly) slower convergence it
has. This is the nature of iterative solvers, the more parallelism
means the more "older" information is used in the solution process
hence slower convergence.'

but I seem to be observing the opposite effect.

Many thanks,

Matija
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ASM.pdf
Type: application/pdf
Size: 7173 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/27bbcde9/attachment.pdf>

From jed at 59A2.org  Wed Feb 16 09:25:39 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 16:25:39 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161518.47281.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
	<AANLkTi=nFOBT+-sihQkTfkiyJpzA+-J10ROivHyJbfet@mail.gmail.com>
	<201102161518.47281.juhaj@iki.fi>
Message-ID: <AANLkTin+H+hmFm=iTUqvgteWkCk7UQ+4or8eg-TGjVB4@mail.gmail.com>

On Wed, Feb 16, 2011 at 16:18, Juha J?ykk? <juhaj at iki.fi> wrote:

> So the "solution" in the KSP should actually be identically zero for a
> converged result?
>

Yes


>
> > defect. If the initial guess is zero, then it would normally pick up your
> > Dirichlet boundary conditions on the first iteration and all subsequent
> > solves would have zero in those locations.
>
> It is not zero initially. But, on the other hand, it has zeros at both ends
> even at the very first iteration. If I understood your reply correctly,
> this
> would be expected for an initial guess which has correct values at the
> boundaries and it should only pick them up on the first iteration if they
> were
> not correct to begin with.
>
> Having ruled out a possibility of a bug in KSP, I need to continue my hunt
> for
> DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make
> any
> difference, it always diverges. The funny thing is, it diverges even if I
> start with an *exact* *solution*...
>

This is a problem. Run with -ksp_converged_reason to find out why it's
diverging.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/f94a0686/attachment.htm>

From juhaj at iki.fi  Wed Feb 16 09:33:29 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Wed, 16 Feb 2011 15:33:29 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTin+H+hmFm=iTUqvgteWkCk7UQ+4or8eg-TGjVB4@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi>
	<AANLkTin+H+hmFm=iTUqvgteWkCk7UQ+4or8eg-TGjVB4@mail.gmail.com>
Message-ID: <201102161533.32506.juhaj@iki.fi>

> > DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make
> This is a problem. Run with -ksp_converged_reason to find out why it's
> diverging.

Sorry, I looked at a wrong output. For the exact solution, it is the line 
search, which fails. KSP converges with CONVERGED_RTOL, but SNES quits with 
DIVERGED_LS_FAILURE.

-Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/fa002a80/attachment.pgp>

From knepley at gmail.com  Wed Feb 16 09:44:25 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 16 Feb 2011 09:44:25 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161518.47281.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
	<AANLkTi=nFOBT+-sihQkTfkiyJpzA+-J10ROivHyJbfet@mail.gmail.com>
	<201102161518.47281.juhaj@iki.fi>
Message-ID: <AANLkTi=a9La3bWh4PjKnBYLCHdMidZkL8tPjwDEzyKnB@mail.gmail.com>

On Wed, Feb 16, 2011 at 9:18 AM, Juha J?ykk? <juhaj at iki.fi> wrote:

> > SNESSolve uses a Newton method so the linear system is being solving for
> a
>
> So the "solution" in the KSP should actually be identically zero for a
> converged result?
>

It is a correction, and the correction to the exact answer is zero.


> > defect. If the initial guess is zero, then it would normally pick up your
> > Dirichlet boundary conditions on the first iteration and all subsequent
> > solves would have zero in those locations.
>
> It is not zero initially. But, on the other hand, it has zeros at both ends
> even at the very first iteration. If I understood your reply correctly,
> this
> would be expected for an initial guess which has correct values at the
> boundaries and it should only pick them up on the first iteration if they
> were
> not correct to begin with.
>

Yes.


> Having ruled out a possibility of a bug in KSP, I need to continue my hunt
> for
> DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make
> any
> difference, it always diverges. The funny thing is, it diverges even if I
> start with an *exact* *solution*...
>

It is a good idea to use -ksp_type preonly -pc_type lu to start until you
understand the problem.

   Matt


> Cheers,
> Juha
>
> --
>                 -----------------------------------------------
>                | Juha J?ykk?, juhaj at iki.fi                     |
>                | http://www.maths.leeds.ac.uk/~juhaj           |
>                 -----------------------------------------------
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/e3cf6c73/attachment.htm>

From jed at 59A2.org  Wed Feb 16 09:39:32 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 16:39:32 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161533.32506.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi>
	<AANLkTin+H+hmFm=iTUqvgteWkCk7UQ+4or8eg-TGjVB4@mail.gmail.com>
	<201102161533.32506.juhaj@iki.fi>
Message-ID: <AANLkTi=kChdNoQ7=2fuvxe7M0d6sKwQRNoNHVYagtuPZ@mail.gmail.com>

On Wed, Feb 16, 2011 at 16:33, Juha J?ykk? <juhaj at iki.fi> wrote:

> Sorry, I looked at a wrong output. For the exact solution, it is the line
> search, which fails. KSP converges with CONVERGED_RTOL, but SNES quits with
> DIVERGED_LS_FAILURE.
>

Your Jacobian may be incorrect, try running with -snes_mf_operator -pc_type
lu -ksp_monitor. The linear solves should converge in 1 iteration. You can
also try -mat_mffd_type ds if the residual is ill-conditioned.

If you can make the problem small, use -snes_type test to check the
correctness of the Jacobian directly and -snes_test_display to show how the
entries.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/31c603a0/attachment-0001.htm>

From juhaj at iki.fi  Wed Feb 16 10:04:05 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Wed, 16 Feb 2011 16:04:05 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTi=a9La3bWh4PjKnBYLCHdMidZkL8tPjwDEzyKnB@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi>
	<AANLkTi=a9La3bWh4PjKnBYLCHdMidZkL8tPjwDEzyKnB@mail.gmail.com>
Message-ID: <201102161604.08073.juhaj@iki.fi>

Since I got two emails before I could reply to one, I am replying to both 
simultaneously.

> It is a good idea to use -ksp_type preonly -pc_type lu to start until you
> understand the problem.

Matthew: Thanks, but I still get diverging line searches. 

> Your Jacobian may be incorrect, try running with -snes_mf_operator -pc_type 
> lu -ksp_monitor. The linear solves should converge in 1 iteration. You can
> also try -mat_mffd_type ds if the residual is ill-conditioned.

Jed: I am running with a FD Jacobian just to make sure my hand-written one is 
not the culprit. Is there any reason to suspect this might be the reason?

-Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/2c70b490/attachment.pgp>

From knepley at gmail.com  Wed Feb 16 10:05:46 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 16 Feb 2011 10:05:46 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161604.08073.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi>
	<AANLkTi=a9La3bWh4PjKnBYLCHdMidZkL8tPjwDEzyKnB@mail.gmail.com>
	<201102161604.08073.juhaj@iki.fi>
Message-ID: <AANLkTin0s6kfgA5vroOpFHk5w=xyCpLvaMYUOkCD=BeW@mail.gmail.com>

On Wed, Feb 16, 2011 at 10:04 AM, Juha J?ykk? <juhaj at iki.fi> wrote:

> Since I got two emails before I could reply to one, I am replying to both
> simultaneously.
>
> > It is a good idea to use -ksp_type preonly -pc_type lu to start until you
> > understand the problem.
>
> Matthew: Thanks, but I still get diverging line searches.
>
> > Your Jacobian may be incorrect, try running with -snes_mf_operator
> -pc_type
> > lu -ksp_monitor. The linear solves should converge in 1 iteration. You
> can
> > also try -mat_mffd_type ds if the residual is ill-conditioned.
>
> Jed: I am running with a FD Jacobian just to make sure my hand-written one
> is
> not the culprit. Is there any reason to suspect this might be the reason?
>

Yes, line search failure often occurs for incorrect Jacobians because the
solution
to the Newton system is not a descent direction, which is checked by the
line search.

   Matt


> -Juha
>
> --
>                 -----------------------------------------------
>                | Juha J?ykk?, juhaj at iki.fi                     |
>                | http://www.maths.leeds.ac.uk/~juhaj           |
>                 -----------------------------------------------
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/3178c46a/attachment.htm>

From jed at 59A2.org  Wed Feb 16 10:06:15 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 17:06:15 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161604.08073.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi>
	<AANLkTi=a9La3bWh4PjKnBYLCHdMidZkL8tPjwDEzyKnB@mail.gmail.com>
	<201102161604.08073.juhaj@iki.fi>
Message-ID: <AANLkTin3rXMG5esX3E+SGfdVcipJGPUi+A7=5L=ftfmY@mail.gmail.com>

On Wed, Feb 16, 2011 at 17:04, Juha J?ykk? <juhaj at iki.fi> wrote:

> Is there any reason to suspect this might be the reason?


Yes, it is the most common place to make programming mistakes and the
symptoms you describe are typical.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/b5626157/attachment.htm>

From jed at 59A2.org  Wed Feb 16 10:28:09 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 17:28:09 +0100
Subject: [petsc-users] Additive Schwarz Method output variable with
 processor number
In-Reply-To: <AANLkTikyFaQUitfKrxAdt_rhLWgBjaCsmJz7BdYNRvfy@mail.gmail.com>
References: <AANLkTikyFaQUitfKrxAdt_rhLWgBjaCsmJz7BdYNRvfy@mail.gmail.com>
Message-ID: <AANLkTimrsO4oXBccwnPgc5SMH0-ruyseASo1CGEjzoAY@mail.gmail.com>

On Wed, Feb 16, 2011 at 12:54, Matija Kecman <matijakecman at gmail.com> wrote:

> After cleaning up the log files and plotting log ( ||Ae||/||Ax|| )
> with iteration number I generated the attached figure. I am wondering
> why the number of iterations for convergence depends on the number of
> processors used? According to the FAQ:
>
> 'The convergence of many of the preconditioners in PETSc including the
> the default parallel preconditioner block Jacobi depends on the number
> of processes. The more processes the (slightly) slower convergence it
> has. This is the nature of iterative solvers, the more parallelism
> means the more "older" information is used in the solution process
> hence slower convergence.'
>
> but I seem to be observing the opposite effect.
>

You are using the same number of subdomains, but they are shaped
differently. It seems likely that you have Parmetis installed in which case
PCASM uses it to partition multiple subdomains on each process. In this
case, those domains are not as good as the rectangular partition that you
get by using more processes. Compare:

$ mpiexec -n 1 ./ex8 -m 200 -n 200 -sub_pc_type lu -ksp_converged_reason
-pc_type asm -pc_asm_blocks 4 -mat_partitioning_type parmetis
Linear solve converged due to CONVERGED_RTOL iterations 27
$ mpiexec -n 1 ./ex8 -m 200 -n 200 -sub_pc_type lu -ksp_converged_reason
-pc_type asm -pc_asm_blocks 4 -mat_partitioning_type square
Linear solve converged due to CONVERGED_RTOL iterations 22
$ mpiexec -n 4 ./ex8 -m 200 -n 200 -sub_pc_type lu -ksp_converged_reason
-pc_type asm -pc_asm_blocks 4
Linear solve converged due to CONVERGED_RTOL iterations 22
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/98980a3c/attachment.htm>

From juhaj at iki.fi  Wed Feb 16 10:31:47 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Wed, 16 Feb 2011 16:31:47 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTin3rXMG5esX3E+SGfdVcipJGPUi+A7=5L=ftfmY@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi>
	<AANLkTin3rXMG5esX3E+SGfdVcipJGPUi+A7=5L=ftfmY@mail.gmail.com>
Message-ID: <201102161631.50314.juhaj@iki.fi>

> Yes, it is the most common place to make programming mistakes and the
> symptoms you describe are typical.

Please let me double-check there has not been a misunderstanding here: the 
problems I describe occur with the PETSc built-in FD Jacobian approximation, 
not my own. Now, I realise this will be a less-than-optimal approximation, but 
I fail to see how there could be a programming mistake, when I am using 
SNESDefaultComputeJacobianColor and not my hand-written Jacobian.

I do get the same symptoms with the hand-written one, too. That's why I wanted 
to check with the PETSc built in FD version.

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/7e8fe492/attachment.pgp>

From jed at 59A2.org  Wed Feb 16 10:37:06 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 17:37:06 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161631.50314.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi>
	<AANLkTin3rXMG5esX3E+SGfdVcipJGPUi+A7=5L=ftfmY@mail.gmail.com>
	<201102161631.50314.juhaj@iki.fi>
Message-ID: <AANLkTiny5ST9q3St9-fWSqkP-UHJvacQirrmkmkztR8w@mail.gmail.com>

On Wed, Feb 16, 2011 at 17:31, Juha J?ykk? <juhaj at iki.fi> wrote:

> Please let me double-check there has not been a misunderstanding here: the
> problems I describe occur with the PETSc built-in FD Jacobian
> approximation,
> not my own. Now, I realise this will be a less-than-optimal approximation,
> but
> I fail to see how there could be a programming mistake, when I am using
> SNESDefaultComputeJacobianColor and not my hand-written Jacobian.
>

> I do get the same symptoms with the hand-written one, too. That's why I
> wanted
> to check with the PETSc built in FD version.
>

If your system is poorly scaled or genuinely ill-conditioned, the FD
Jacobian could be bad. Sometimes it helps to use a more robust method of
determining the differencing parameter: -mat_fd_type ds (when using
coloring) or -mat_mffd_type ds (when using -snes_mf_operator). You can also
try solving the linear system to higher tolerance and looking at the true
residual to be sure the linear system really is solved accurately. What sort
of problem are you solving?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/5a1a7577/attachment.htm>

From bsmith at mcs.anl.gov  Wed Feb 16 10:40:50 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 16 Feb 2011 10:40:50 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161631.50314.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi>
	<AANLkTin3rXMG5esX3E+SGfdVcipJGPUi+A7=5L=ftfmY@mail.gmail.com>
	<201102161631.50314.juhaj@iki.fi>
Message-ID: <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov>


   Try using SNESDefaultComputeJacobian() see if that makes any difference.  

   99.9% of the causes of non-convergencing Newton are wrong or slightly wrong Jacobians. Very unlikely possibilities are

1) it is converging to a local minimum that is not a solution. This is checked by PETSc automatically if the line search failed so is unlikely to be the problem. But run with -info and it will print a great deal of information about the nonlinear solver including a message about  " near zero implies" cut and paste all the message about the "near zero" and send it to us.

2) the function is not smooth so Newton's taylor series approximation simply doesn't work.


   Barry

On Feb 16, 2011, at 10:31 AM, Juha J?ykk? wrote:

>> Yes, it is the most common place to make programming mistakes and the
>> symptoms you describe are typical.
> 
> Please let me double-check there has not been a misunderstanding here: the 
> problems I describe occur with the PETSc built-in FD Jacobian approximation, 
> not my own. Now, I realise this will be a less-than-optimal approximation, but 
> I fail to see how there could be a programming mistake, when I am using 
> SNESDefaultComputeJacobianColor and not my hand-written Jacobian.
> 
> I do get the same symptoms with the hand-written one, too. That's why I wanted 
> to check with the PETSc built in FD version.
> 
> Cheers,
> Juha
> 
> -- 
> 		 -----------------------------------------------
> 		| Juha J?ykk?, juhaj at iki.fi			|
> 		| http://www.maths.leeds.ac.uk/~juhaj		|
> 		 -----------------------------------------------


From knepley at gmail.com  Wed Feb 16 11:42:14 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 16 Feb 2011 11:42:14 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov>
References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi>
	<AANLkTin3rXMG5esX3E+SGfdVcipJGPUi+A7=5L=ftfmY@mail.gmail.com>
	<201102161631.50314.juhaj@iki.fi>
	<17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov>
Message-ID: <AANLkTi=+5h00ZU6PbY+7k1eFpr4bo5S30DPepCnnmUt+@mail.gmail.com>

On Wed, Feb 16, 2011 at 10:40 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Try using SNESDefaultComputeJacobian() see if that makes any difference.
>
>   99.9% of the causes of non-convergencing Newton are wrong or slightly
> wrong Jacobians. Very unlikely possibilities are
>
> 1) it is converging to a local minimum that is not a solution. This is
> checked by PETSc automatically if the line search failed so is unlikely to
> be the problem. But run with -info and it will print a great deal of
> information about the nonlinear solver including a message about  " near
> zero implies" cut and paste all the message about the "near zero" and send
> it to us.
>
> 2) the function is not smooth so Newton's taylor series approximation
> simply doesn't work.


Unlikely possibility #3:

  You have written an equation with no real solutions, meaning there is a
mistake in your function.

     Matt


>
>   Barry
>
> On Feb 16, 2011, at 10:31 AM, Juha J?ykk? wrote:
>
> >> Yes, it is the most common place to make programming mistakes and the
> >> symptoms you describe are typical.
> >
> > Please let me double-check there has not been a misunderstanding here:
> the
> > problems I describe occur with the PETSc built-in FD Jacobian
> approximation,
> > not my own. Now, I realise this will be a less-than-optimal
> approximation, but
> > I fail to see how there could be a programming mistake, when I am using
> > SNESDefaultComputeJacobianColor and not my hand-written Jacobian.
> >
> > I do get the same symptoms with the hand-written one, too. That's why I
> wanted
> > to check with the PETSc built in FD version.
> >
> > Cheers,
> > Juha
> >
> > --
> >                -----------------------------------------------
> >               | Juha J?ykk?, juhaj at iki.fi                     |
> >               | http://www.maths.leeds.ac.uk/~juhaj           |
> >                -----------------------------------------------
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/47c5317e/attachment-0001.htm>

From juhaj at iki.fi  Wed Feb 16 13:39:42 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Wed, 16 Feb 2011 19:39:42 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTi=+5h00ZU6PbY+7k1eFpr4bo5S30DPepCnnmUt+@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi>
	<17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov>
	<AANLkTi=+5h00ZU6PbY+7k1eFpr4bo5S30DPepCnnmUt+@mail.gmail.com>
Message-ID: <201102161939.45779.juhaj@iki.fi>

> > 1) it is converging to a local minimum that is not a solution. This is
> > checked by PETSc automatically if the line search failed so is unlikely
> > to be the problem. But run with -info and it will print a great deal of
> > information about the nonlinear solver including a message about  " near
> > zero implies" cut and paste all the message about the "near zero" and
> > send it to us.

There is just one and it is the last message before the usual -snes_monitor 
output:

[0] SNESLSCheckLocalMin_Private(): (F^T J random)/(|| F ||*||random|| 20.4682 
near zero implies found a local minimum

This is with SNES...JacobianColor(). With my own Jacobian, there are none.

> > 2) the function is not smooth so Newton's taylor series approximation
> > simply doesn't work.

Which function, F(u) or my u*, which satisfies F(u*)=0? I.e. the unknown or 
the function evaluated by FormFunction?

I find it unlikely that the solution would not be at least twice 
differentiable (apart from the endpoints, where it is not): it is almost 
guaranteed to be since the equation is the Euler-Lagrange equation of a well 
behaved action integral.

As for F(u), the function is a polynomial of u and x (x being the coordinate), 
so it is smooth, if u is.

> Unlikely possibility #3:
> 
>   You have written an equation with no real solutions, meaning there is a
> mistake in your function.

But my initial guess is an exact solution. I have two free parameters in the 
equation and for a single choice I can find an exact solution - it happens to 
be u(x) = x, so the discrete derivatives are exactly the same as the 
continuous ones (apart from floating point rounding errors, of course).

Now, it is quite possible that my problem is poorly scaled or ill-conditioned, 
like Jed Brown suggested. Can I check the eigenvalues of the KSP matrices 
somehow?

-Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/785508dc/attachment.pgp>

From knepley at gmail.com  Wed Feb 16 13:44:13 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 16 Feb 2011 13:44:13 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161939.45779.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
	<17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov>
	<AANLkTi=+5h00ZU6PbY+7k1eFpr4bo5S30DPepCnnmUt+@mail.gmail.com>
	<201102161939.45779.juhaj@iki.fi>
Message-ID: <AANLkTimHqNGxc4cgBcmpYzZ9OtURvSYTch3CmHRSxG5u@mail.gmail.com>

On Wed, Feb 16, 2011 at 1:39 PM, Juha J?ykk? <juhaj at iki.fi> wrote:

> > > 1) it is converging to a local minimum that is not a solution. This is
> > > checked by PETSc automatically if the line search failed so is unlikely
> > > to be the problem. But run with -info and it will print a great deal of
> > > information about the nonlinear solver including a message about  "
> near
> > > zero implies" cut and paste all the message about the "near zero" and
> > > send it to us.
>
> There is just one and it is the last message before the usual -snes_monitor
> output:
>
> [0] SNESLSCheckLocalMin_Private(): (F^T J random)/(|| F ||*||random||
> 20.4682
> near zero implies found a local minimum
>
> This is with SNES...JacobianColor(). With my own Jacobian, there are none.
>
> > > 2) the function is not smooth so Newton's taylor series approximation
> > > simply doesn't work.
>
> Which function, F(u) or my u*, which satisfies F(u*)=0? I.e. the unknown or
> the function evaluated by FormFunction?
>
> I find it unlikely that the solution would not be at least twice
> differentiable (apart from the endpoints, where it is not): it is almost
> guaranteed to be since the equation is the Euler-Lagrange equation of a
> well
> behaved action integral.
>
> As for F(u), the function is a polynomial of u and x (x being the
> coordinate),
> so it is smooth, if u is.
>
> > Unlikely possibility #3:
> >
> >   You have written an equation with no real solutions, meaning there is a
> > mistake in your function.
>
> But my initial guess is an exact solution. I have two free parameters in
> the
> equation and for a single choice I can find an exact solution - it happens
> to
> be u(x) = x, so the discrete derivatives are exactly the same as the
> continuous ones (apart from floating point rounding errors, of course).
>

Wait, if your initial guess is an exact solution, there should be no KSP
solve.

   Matt


> Now, it is quite possible that my problem is poorly scaled or
> ill-conditioned,
> like Jed Brown suggested. Can I check the eigenvalues of the KSP matrices
> somehow?
>
> -Juha
>
> --
>                 -----------------------------------------------
>                | Juha J?ykk?, juhaj at iki.fi                     |
>                | http://www.maths.leeds.ac.uk/~juhaj           |
>                 -----------------------------------------------
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/b854bdaa/attachment.htm>

From jed at 59A2.org  Wed Feb 16 13:49:33 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 16 Feb 2011 20:49:33 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102161939.45779.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
	<17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov>
	<AANLkTi=+5h00ZU6PbY+7k1eFpr4bo5S30DPepCnnmUt+@mail.gmail.com>
	<201102161939.45779.juhaj@iki.fi>
Message-ID: <AANLkTinJiq3ubqDhULR8=Scn6_JD+d2DS_JNquE8e26S@mail.gmail.com>

On Wed, Feb 16, 2011 at 20:39, Juha J?ykk? <juhaj at iki.fi> wrote:

> Which function, F(u) or my u*, which satisfies F(u*)=0? I.e. the unknown or
> the function evaluated by FormFunction?
>
> I find it unlikely that the solution would not be at least twice
> differentiable (apart from the endpoints, where it is not): it is almost
> guaranteed to be since the equation is the Euler-Lagrange equation of a
> well
> behaved action integral.
>
> As for F(u), the function is a polynomial of u and x (x being the
> coordinate),
> so it is smooth, if u is.
>

We just want F(u) to be continuously differentiable as a function from R^n
to R^n (were n is the size of u).


>
> > Unlikely possibility #3:
> >
> >   You have written an equation with no real solutions, meaning there is a
> > mistake in your function.
>
> But my initial guess is an exact solution.
>

So the initial SNES residual is nearly zero? Then the differencing and the
solve could be dominated by rounding error.


> I have two free parameters in the
> equation and for a single choice I can find an exact solution - it happens
> to
> be u(x) = x, so the discrete derivatives are exactly the same as the
> continuous ones (apart from floating point rounding errors, of course).
>
> Now, it is quite possible that my problem is poorly scaled or
> ill-conditioned,
> like Jed Brown suggested. Can I check the eigenvalues of the KSP matrices
> somehow?
>

There are a few ways to do it automatically as part of the solve. These
shows you spectral information for the preconditioned operator so run with
-pc_type none to see the true spectrum. You typically need to use GMRES for
these to work:

  -ksp_monitor_singular_value <stdout>: Monitor singular values
(KSPMonitorSet)
  -ksp_compute_singularvalues: <FALSE> Compute singular values of
preconditioned operator (KSPSetComputeSingularValues)
  -ksp_compute_eigenvalues: <FALSE> Compute eigenvalues of preconditioned
operator (KSPSetComputeSingularValues)
  -ksp_plot_eigenvalues: <FALSE> Scatter plot extreme eigenvalues
(KSPSetComputeSingularValues)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/de3e42d0/attachment.htm>

From juhaj at iki.fi  Wed Feb 16 18:11:45 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Thu, 17 Feb 2011 00:11:45 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTinJiq3ubqDhULR8=Scn6_JD+d2DS_JNquE8e26S@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi> <201102161939.45779.juhaj@iki.fi>
	<AANLkTinJiq3ubqDhULR8=Scn6_JD+d2DS_JNquE8e26S@mail.gmail.com>
Message-ID: <201102170011.54502.juhaj@iki.fi>

Hi again.

I do not know which one to answer to, but thanks to all. I got one step 
further with your help:

[0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12!

Adding -pc_factor_shift_type POSITIVE_DEFINITE helps and now I get at least 
some progress. It does not matter if I use -snes_fd, coloring or my hand-
written Jacobian. If I start from a non-solution initial guess, they all 
progress somewhat towards what I believe is more or less a real solution. This 
is all ran with -pc_type lu -ksp_type preonly, btw.

However, they do not seem to change the second to the last (counting from left 
to right) value on the lattice (this is 1D). It does change, but only by 0.02% 
while values elsewhere in the lattice change much more significantly. Again, I 
am running with the parameters where I know the exact solution and my initial 
guess is, on purpose, 0.01 off the correct one (except at the endpoints). What 
could be causing this?

One thing strikes me as odd. I tried checking the value of the function at the 
first iteration and at that particular point it is of the order of -1.e-9, 
whereas in the next point to the left, it is +1.e-9 and then starts increasing 
towards the left before it starts decreasing again around half-way through the 
lattice.

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/a1830ada/attachment.pgp>

From knepley at gmail.com  Wed Feb 16 18:30:45 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 16 Feb 2011 18:30:45 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102170011.54502.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102161939.45779.juhaj@iki.fi>
	<AANLkTinJiq3ubqDhULR8=Scn6_JD+d2DS_JNquE8e26S@mail.gmail.com>
	<201102170011.54502.juhaj@iki.fi>
Message-ID: <AANLkTim1hX5sjGRQZfhRq5QRf1BqF=bShjm8u75=0NW5@mail.gmail.com>

On Wed, Feb 16, 2011 at 6:11 PM, Juha J?ykk? <juhaj at iki.fi> wrote:

> Hi again.
>
> I do not know which one to answer to, but thanks to all. I got one step
> further with your help:
>
> [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12!
>

You have a singular Jacobian, which leads me to believe that your boundary
conditions are incorrect.

   Matt


> Adding -pc_factor_shift_type POSITIVE_DEFINITE helps and now I get at least
> some progress. It does not matter if I use -snes_fd, coloring or my hand-
> written Jacobian. If I start from a non-solution initial guess, they all
> progress somewhat towards what I believe is more or less a real solution.
> This
> is all ran with -pc_type lu -ksp_type preonly, btw.
>
> However, they do not seem to change the second to the last (counting from
> left
> to right) value on the lattice (this is 1D). It does change, but only by
> 0.02%
> while values elsewhere in the lattice change much more significantly.
> Again, I
> am running with the parameters where I know the exact solution and my
> initial
> guess is, on purpose, 0.01 off the correct one (except at the endpoints).
> What
> could be causing this?
>
> One thing strikes me as odd. I tried checking the value of the function at
> the
> first iteration and at that particular point it is of the order of -1.e-9,
> whereas in the next point to the left, it is +1.e-9 and then starts
> increasing
> towards the left before it starts decreasing again around half-way through
> the
> lattice.
>
> Cheers,
> Juha
>
> --
>                 -----------------------------------------------
>                | Juha J?ykk?, juhaj at iki.fi                     |
>                | http://www.maths.leeds.ac.uk/~juhaj           |
>                 -----------------------------------------------
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/9ce2c517/attachment.htm>

From mgrabbani at gmail.com  Wed Feb 16 21:18:21 2011
From: mgrabbani at gmail.com (Golam Rabbani)
Date: Wed, 16 Feb 2011 19:18:21 -0800
Subject: [petsc-users] Removing petsc
Message-ID: <AANLkTimGOCD9heKrQcRP1dbM5Mz5n5kAGeibNGy1DHFk@mail.gmail.com>

Hi,

I am new to petsc and also new to linux/unix environment.
Recently I started converting a matlab nanodevice simulaiton code into a c
code using petsc and have done most of the converting.
But in the process I have installed petsc 4/5 times with different
configuration options (real, complex, with blopex, etc). Now that I
do not need some of them,  I want to uninstall them. Can I simply delete the
unwanted versions (all the versions are in separate folders)?
Please advise me if something different is needed.

Thanks in advance,
Golam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110216/1ed590ed/attachment-0001.htm>

From bsmith at mcs.anl.gov  Wed Feb 16 21:20:35 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 16 Feb 2011 21:20:35 -0600
Subject: [petsc-users] Removing petsc
In-Reply-To: <AANLkTimGOCD9heKrQcRP1dbM5Mz5n5kAGeibNGy1DHFk@mail.gmail.com>
References: <AANLkTimGOCD9heKrQcRP1dbM5Mz5n5kAGeibNGy1DHFk@mail.gmail.com>
Message-ID: <F1A33004-D2F0-4775-B974-638AF540C28E@mcs.anl.gov>


  If you just used different PETSC_ARCH variables like arch-complex arch-opt etc  you can just delete those directories and that will remove everything related to them.

   Barry

On Feb 16, 2011, at 9:18 PM, Golam Rabbani wrote:

> Hi,
> 
> I am new to petsc and also new to linux/unix environment.
> Recently I started converting a matlab nanodevice simulaiton code into a c code using petsc and have done most of the converting.
> But in the process I have installed petsc 4/5 times with different configuration options (real, complex, with blopex, etc). Now that I
> do not need some of them,  I want to uninstall them. Can I simply delete the unwanted versions (all the versions are in separate folders)?
> Please advise me if something different is needed.
> 
> Thanks in advance,
> Golam


From elhombrefr at hotmail.fr  Thu Feb 17 08:01:14 2011
From: elhombrefr at hotmail.fr (El Hombre Frances)
Date: Thu, 17 Feb 2011 15:01:14 +0100
Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran
Message-ID: <4D5D2A2A.2070707@hotmail.fr>

Hi,
I'm looking for how to get information about the DA of a DMMG finest grid.
I saw that you can access them with
call DMMGGetDA(...
call DAGetInfo(...
but it doesn't work in the main program, i get this error
0]PETSC ERROR: Invalid argument!
[0]PETSC ERROR: Wrong type of object: Parameter # 1!

I tried with this example
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html

I don't know how to get DA infos outside the subroutines ComputeRHS and 
ComputeJacobian

I want grid size of the DA 3D in order to plot the field. I noticed that 
it's possible with PetscObjectQuery in C language.

Thanks for your help

Pierre Navaro


From juhaj at iki.fi  Thu Feb 17 08:08:37 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Thu, 17 Feb 2011 14:08:37 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTim1hX5sjGRQZfhRq5QRf1BqF=bShjm8u75=0NW5@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi> <201102170011.54502.juhaj@iki.fi>
	<AANLkTim1hX5sjGRQZfhRq5QRf1BqF=bShjm8u75=0NW5@mail.gmail.com>
Message-ID: <201102171408.40609.juhaj@iki.fi>

> > [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12!
> You have a singular Jacobian, which leads me to believe that your boundary
> conditions are incorrect.

Thanks for the tip. I also found the following in the -info output:

[0] SNESLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 27.9594 near zero 
implies inconsistent rhs

which is strange at first sight: my RHS is my equation, how can that be 
inconsistent? It is what defines the problem. BUT your comment about the 
boundary conditions led me to look into them in more detail.

It seems one of them may be trivially satisfied, leaving me with an infinite 
number of solutions satisfying the other one too. Could this be the reason for 
my problems? If so, I need to start thinking of another boundary condition... 
not nice, since the problem I am solving does not really give them! =(

Take an example:

r f''/f - r (f'/f)^2 + f'/f = 0.

This has the general solution

f(r) = a r^b,

but if the boundary conditions are f(0)=0 and f(1)=1 (like I had in my real 
problem), we have (if b>0):

a*0 = 0
a*1 = 1

and b is left undetermined or (if b<0):

lim(a*r^b) = 0 as r->0 => a=0
0*1^b = 1 => no solution.

I am unsure how to interpret b=0 since then we have 0^0 at the boundary. But 
even disregarding the undefined value at the boundary, there cannot be a 
continuous solution with b=0 since then f(r>0)=1, but f(0)=0. Looks like 
Heaviside step to me.

In summary, I cannot even solve this simpler problem with SNES, but I think I 
understand the reason now: I have not specified boundary conditions which 
would give a unique solution - they probably do not even give a finite number 
of solutions, but an inifinite (quite likely uncountable, like in my example) 
number of solutions.

Any thoughts on this? Is there any sense in my analysis? If so, I will need to 
go think about Neumann boundaries...

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/1554a5fd/attachment.pgp>

From knepley at gmail.com  Thu Feb 17 08:15:01 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 08:15:01 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102171408.40609.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102170011.54502.juhaj@iki.fi>
	<AANLkTim1hX5sjGRQZfhRq5QRf1BqF=bShjm8u75=0NW5@mail.gmail.com>
	<201102171408.40609.juhaj@iki.fi>
Message-ID: <AANLkTi=StLcdPLN6bkB4TekMQPo=KkAhXrSReRTNWy79@mail.gmail.com>

On Thu, Feb 17, 2011 at 8:08 AM, Juha J?ykk? <juhaj at iki.fi> wrote:

> > > [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12!
> > You have a singular Jacobian, which leads me to believe that your
> boundary
> > conditions are incorrect.
>
> Thanks for the tip. I also found the following in the -info output:
>
> [0] SNESLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 27.9594 near zero
> implies inconsistent rhs
>
> which is strange at first sight: my RHS is my equation, how can that be
> inconsistent? It is what defines the problem. BUT your comment about the
> boundary conditions led me to look into them in more detail.
>
> It seems one of them may be trivially satisfied, leaving me with an
> infinite
> number of solutions satisfying the other one too. Could this be the reason
> for
> my problems? If so, I need to start thinking of another boundary
> condition...
> not nice, since the problem I am solving does not really give them! =(
>
> Take an example:
>
> r f''/f - r (f'/f)^2 + f'/f = 0.
>
> This has the general solution
>
> f(r) = a r^b,
>
> but if the boundary conditions are f(0)=0 and f(1)=1 (like I had in my real
> problem), we have (if b>0):
>
> a*0 = 0
> a*1 = 1
>
> and b is left undetermined or (if b<0):
>
> lim(a*r^b) = 0 as r->0 => a=0
> 0*1^b = 1 => no solution.
>
> I am unsure how to interpret b=0 since then we have 0^0 at the boundary.
> But
> even disregarding the undefined value at the boundary, there cannot be a
> continuous solution with b=0 since then f(r>0)=1, but f(0)=0. Looks like
> Heaviside step to me.
>
> In summary, I cannot even solve this simpler problem with SNES, but I think
> I
> understand the reason now: I have not specified boundary conditions which
> would give a unique solution - they probably do not even give a finite
> number
> of solutions, but an inifinite (quite likely uncountable, like in my
> example)
> number of solutions.
>
> Any thoughts on this? Is there any sense in my analysis? If so, I will need
> to
> go think about Neumann boundaries...


Yes, if your BC do not give at least a locally unique solution, then your
Jacobian will
be rank deficient and Newton breaks down. You can still try Picard, but I
recommend
understanding what you mean by a solution first.

   Matt


> Cheers,
> Juha
>
> --
>                 -----------------------------------------------
>                | Juha J?ykk?, juhaj at iki.fi                     |
>                | http://www.maths.leeds.ac.uk/~juhaj           |
>                 -----------------------------------------------
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/3cd32b89/attachment.htm>

From knepley at gmail.com  Thu Feb 17 08:19:46 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 08:19:46 -0600
Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran
In-Reply-To: <4D5D2A2A.2070707@hotmail.fr>
References: <4D5D2A2A.2070707@hotmail.fr>
Message-ID: <AANLkTinkhTNaipMrRynkvOw5CWXm-z-tU_7K-FGWNdE_@mail.gmail.com>

On Thu, Feb 17, 2011 at 8:01 AM, El Hombre Frances <elhombrefr at hotmail.fr>wrote:

> Hi,
> I'm looking for how to get information about the DA of a DMMG finest grid.
> I saw that you can access them with
> call DMMGGetDA(...
> call DAGetInfo(...
> but it doesn't work in the main program, i get this error
> 0]PETSC ERROR: Invalid argument!
> [0]PETSC ERROR: Wrong type of object: Parameter # 1!
>

1) Always send the COMPLETE error message

2) That should work. Notice that we use that on line 75

   Matt


> I tried with this example
>
> http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html
>
> I don't know how to get DA infos outside the subroutines ComputeRHS and
> ComputeJacobian
>
> I want grid size of the DA 3D in order to plot the field. I noticed that
> it's possible with PetscObjectQuery in C language.
>
> Thanks for your help
>
> Pierre Navaro
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/3d9fd0a3/attachment.htm>

From elhombrefr at hotmail.fr  Thu Feb 17 09:00:44 2011
From: elhombrefr at hotmail.fr (Pierre Navaro)
Date: Thu, 17 Feb 2011 16:00:44 +0100
Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran
In-Reply-To: <AANLkTinkhTNaipMrRynkvOw5CWXm-z-tU_7K-FGWNdE_@mail.gmail.com>
References: <4D5D2A2A.2070707@hotmail.fr>
	<AANLkTinkhTNaipMrRynkvOw5CWXm-z-tU_7K-FGWNdE_@mail.gmail.com>
Message-ID: <4D5D381C.7010207@hotmail.fr>

Hi
I add these lines on line 54
      call DMMGGetDA(dmmg,db,ierr)
       call DAGetInfo(db,PETSC_NULL_INTEGER,mx,my,mz, &
&               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, &
&               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, &
&               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, &
&               PETSC_NULL_INTEGER,ierr)
       print"('mx,my,mz =',3i4,X,i6)",mx,my,mz, mx*my*mz
and i get these errors
[0]PETSC ERROR: --------------------- Error Message 
------------------------------------
[0]PETSC ERROR: Invalid argument!
[0]PETSC ERROR: Wrong type of object: Parameter # 1!
[0]PETSC ERROR: 
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 
14:26:37 CST 2010
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: 
------------------------------------------------------------------------
[0]PETSC ERROR: ./ex22 on a arch-osx- named m-navaro.u-strasbg.fr by 
navaro Thu Feb 17 15:59:43 2011
[0]PETSC ERROR: Libraries linked from /opt/petsc-3.1-p7/arch-osx-10.6/lib
[0]PETSC ERROR: Configure run at Tue Jan 25 09:41:36 2011
[0]PETSC ERROR: Configure options --with-cc=gcc --with-fc="gfortran 
-m64" --with-cxx=g++ --download-mpich=1 PETSC_ARCH=arch-osx-10.6
[0]PETSC ERROR: 
------------------------------------------------------------------------
[0]PETSC ERROR: DAGetInfo() line 309 in src/dm/da/src/daview.c
mx,my,mz =   0   0   0      0

Best regards
Pierre

On 17/02/11 15:19, Matthew Knepley wrote:
> On Thu, Feb 17, 2011 at 8:01 AM, El Hombre Frances 
> <elhombrefr at hotmail.fr <mailto:elhombrefr at hotmail.fr>> wrote:
>
>     Hi,
>     I'm looking for how to get information about the DA of a DMMG
>     finest grid.
>     I saw that you can access them with
>     call DMMGGetDA(...
>     call DAGetInfo(...
>     but it doesn't work in the main program, i get this error
>     0]PETSC ERROR: Invalid argument!
>     [0]PETSC ERROR: Wrong type of object: Parameter # 1!
>
>
> 1) Always send the COMPLETE error message
>
> 2) That should work. Notice that we use that on line 75
>
>    Matt
>
>     I tried with this example
>     http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html
>
>     I don't know how to get DA infos outside the subroutines
>     ComputeRHS and ComputeJacobian
>
>     I want grid size of the DA 3D in order to plot the field. I
>     noticed that it's possible with PetscObjectQuery in C language.
>
>     Thanks for your help
>
>     Pierre Navaro
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener


-- 
Pierre NAVARO
IRMA - UMR 7501 CNRS/Universite de Strasbourg - Bureau i101
7 rue Rene Descartes F-67084 Strasbourg Cedex, FRANCE.
tel : (33) [0]3 68 85 01 73, fax : (33) [0]3 68 85 01 05
http://www-irma.u-strasbg.fr/~navaro

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/aef8bd1c/attachment-0001.htm>

From knepley at gmail.com  Thu Feb 17 09:29:44 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 09:29:44 -0600
Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran
In-Reply-To: <4D5D381C.7010207@hotmail.fr>
References: <4D5D2A2A.2070707@hotmail.fr>
	<AANLkTinkhTNaipMrRynkvOw5CWXm-z-tU_7K-FGWNdE_@mail.gmail.com>
	<4D5D381C.7010207@hotmail.fr>
Message-ID: <AANLkTimW_wvKKuGLcAvO2rPT7vM3cgtALBcw2NA-8DWG@mail.gmail.com>

On Thu, Feb 17, 2011 at 9:00 AM, Pierre Navaro <elhombrefr at hotmail.fr>wrote:

>  Hi
> I add these lines on line 54
>      call DMMGGetDA(dmmg,db,ierr)
>       call DAGetInfo(db,PETSC_NULL_INTEGER,mx,my,mz,
> &
>      &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
> &
>      &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
> &
>      &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
> &
>      &               PETSC_NULL_INTEGER,ierr)
>       print"('mx,my,mz =',3i4,X,i6)",mx,my,mz, mx*my*mz
>

Fortran is always so easy:

      DMMG             dmmg, dmmgLevel
      DM               da, db
      PetscInt mx,my,mz

      call DMMGArrayGetDMMG(dmmg,dmmgLevel,ierr)
      call DMMGGetDM(dmmgLevel,db,ierr)
      call DMDAGetInfo(db,PETSC_NULL_INTEGER,mx,my,mz,
 &
     &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
&
     &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
&
     &               PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,
&
     &               PETSC_NULL_INTEGER,ierr)
      print"('mx,my,mz =',3i4,X,i6)",mx,my,mz, mx*my*mz

   Matt


> and i get these errors
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Invalid argument!
>
> [0]PETSC ERROR: Wrong type of object: Parameter # 1!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37
> CST 2010
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./ex22 on a arch-osx- named m-navaro.u-strasbg.fr by
> navaro Thu Feb 17 15:59:43 2011
> [0]PETSC ERROR: Libraries linked from /opt/petsc-3.1-p7/arch-osx-10.6/lib
> [0]PETSC ERROR: Configure run at Tue Jan 25 09:41:36 2011
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc="gfortran -m64"
> --with-cxx=g++ --download-mpich=1 PETSC_ARCH=arch-osx-10.6
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: DAGetInfo() line 309 in src/dm/da/src/daview.c
> mx,my,mz =   0   0   0      0
>
> Best regards
> Pierre
>
>
> On 17/02/11 15:19, Matthew Knepley wrote:
>
> On Thu, Feb 17, 2011 at 8:01 AM, El Hombre Frances <elhombrefr at hotmail.fr>wrote:
>
>> Hi,
>> I'm looking for how to get information about the DA of a DMMG finest grid.
>> I saw that you can access them with
>> call DMMGGetDA(...
>> call DAGetInfo(...
>> but it doesn't work in the main program, i get this error
>> 0]PETSC ERROR: Invalid argument!
>> [0]PETSC ERROR: Wrong type of object: Parameter # 1!
>>
>
>  1) Always send the COMPLETE error message
>
>  2) That should work. Notice that we use that on line 75
>
>     Matt
>
>
>> I tried with this example
>>
>> http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html
>>
>> I don't know how to get DA infos outside the subroutines ComputeRHS and
>> ComputeJacobian
>>
>> I want grid size of the DA 3D in order to plot the field. I noticed that
>> it's possible with PetscObjectQuery in C language.
>>
>> Thanks for your help
>>
>> Pierre Navaro
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> --
> Pierre NAVARO
> IRMA - UMR 7501 CNRS/Universite de Strasbourg - Bureau i101
> 7 rue Rene Descartes F-67084 Strasbourg Cedex, FRANCE.
> tel : (33) [0]3 68 85 01 73, fax : (33) [0]3 68 85 01 05http://www-irma.u-strasbg.fr/~navaro
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/80f2e8cd/attachment.htm>

From ecoon at lanl.gov  Thu Feb 17 10:06:24 2011
From: ecoon at lanl.gov (Ethan Coon)
Date: Thu, 17 Feb 2011 09:06:24 -0700
Subject: [petsc-users] general VecScatter from MPI to MPI
Message-ID: <1297958784.14179.11.camel@echo.lanl.gov>

So I thought I understood how VecScatters worked, but apparently not.
Is it possible to create a general VecScatter from an arbitrarily
partitioned (MPI) Vec to another arbitrarily partitioned (MPI) Vec with
the same global sizes (or same global IS sizes) but different local
sizes?  Shouldn't this just be a matter of relying upon the implied
LocalToGlobalMapping?

See below snippet (and its errors):

Ethan


Vec vA
Vec vB
VecScatter scatter_AB

PetscInt np
PetscInt rank
PetscErrorCode ierr

if (rank.eq.0) np = 3
if (rank.eq.1) np = 1

call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, ierr)
call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, ierr)
 
call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, PETSC_NULL_OBJECT, 
	scatter_AB, ierr)

...

$> mpiexec -n 2 ./test

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Nonconforming object sizes!
[0]PETSC ERROR: Local scatter sizes don't match!
[0]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: --------------------- Error Message
------------------------------------
[1]PETSC ERROR: Nonconforming object sizes!
[1]PETSC ERROR: Local scatter sizes don't match!
[0]PETSC ERROR: Petsc Development HG revision:
5dbe1264252fb9cb5d8e033d620d18f7b0e9111f  HG Date: Fri Feb 11 15:44:04
2011 -0600
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu Feb 17
08:14:57 2011
[0]PETSC ERROR: Libraries linked
from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib
[0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011
[0]PETSC ERROR: Configure options --with-debugging=1
--prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: VecScatterCreate() line 1432 in
src/vec/vec/utils/vscat.c
application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
[1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: Hangup
(signal 1)


-- 
------------------------------------
Ethan Coon
Post-Doctoral Researcher
Applied Mathematics - T-5
Los Alamos National Laboratory
505-665-8289

http://www.ldeo.columbia.edu/~ecoon/
------------------------------------


From knepley at gmail.com  Thu Feb 17 10:35:37 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 10:35:37 -0600
Subject: [petsc-users] general VecScatter from MPI to MPI
In-Reply-To: <1297958784.14179.11.camel@echo.lanl.gov>
References: <1297958784.14179.11.camel@echo.lanl.gov>
Message-ID: <AANLkTi=BQQ-7z8PPKb19nsRCvR1qOhnHaE2WEbqduh_r@mail.gmail.com>

On Thu, Feb 17, 2011 at 10:06 AM, Ethan Coon <ecoon at lanl.gov> wrote:

> So I thought I understood how VecScatters worked, but apparently not.
> Is it possible to create a general VecScatter from an arbitrarily
> partitioned (MPI) Vec to another arbitrarily partitioned (MPI) Vec with
> the same global sizes (or same global IS sizes) but different local
> sizes?  Shouldn't this just be a matter of relying upon the implied
> LocalToGlobalMapping?
>

No, the way you have to do this is to map a global Vec to a bunch of
sequential local Vecs with the sizes you want. This is also how we map
to overlapping arrays.

   Matt


> See below snippet (and its errors):
>
> Ethan
>
>
>
> Vec vA
> Vec vB
> VecScatter scatter_AB
>
> PetscInt np
> PetscInt rank
> PetscErrorCode ierr
>
> if (rank.eq.0) np = 3
> if (rank.eq.1) np = 1
>
> call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, ierr)
> call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, ierr)
>
> call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, PETSC_NULL_OBJECT,
>        scatter_AB, ierr)
>
> ...
>
> $> mpiexec -n 2 ./test
>
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Nonconforming object sizes!
> [0]PETSC ERROR: Local scatter sizes don't match!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: Nonconforming object sizes!
> [1]PETSC ERROR: Local scatter sizes don't match!
> [0]PETSC ERROR: Petsc Development HG revision:
> 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f  HG Date: Fri Feb 11 15:44:04
> 2011 -0600
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu Feb 17
> 08:14:57 2011
> [0]PETSC ERROR: Libraries linked
> from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib
> [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011
> [0]PETSC ERROR: Configure options --with-debugging=1
> --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared
> --download-mpich=1 --download-ml=1 --download-umfpack=1
> --with-blas-lapack-dir=/usr/lib --download-parmetis=yes
> PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1
> --with-shared-libraries=1 --download-hdf5=1
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: VecScatterCreate() line 1432 in
> src/vec/vec/utils/vscat.c
> application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
> [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: Hangup
> (signal 1)
>
>
>
>
> --
> ------------------------------------
> Ethan Coon
> Post-Doctoral Researcher
> Applied Mathematics - T-5
> Los Alamos National Laboratory
> 505-665-8289
>
> http://www.ldeo.columbia.edu/~ecoon/
> ------------------------------------
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/be1b64ef/attachment.htm>

From dominik at itis.ethz.ch  Thu Feb 17 10:37:35 2011
From: dominik at itis.ethz.ch (Dominik Szczerba)
Date: Thu, 17 Feb 2011 17:37:35 +0100
Subject: [petsc-users] custom compiler flags on Windows
Message-ID: <AANLkTikWA2EBegw1jhkp0U-3ss0fOR3s4T9W7YoGNUT9@mail.gmail.com>

I need to use some special compile flags when compiling with 'cl' on Windows.
While configuring I currently use --with-cxx='win32fe cl', which works
fine, but if I add some flags after cl the configure brakes,
complaining that the compiler does not work.
I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the
same as before.
Is there a way to specify my own flags with Petsc (or add to them)?
Best regards,
Dominik

From knepley at gmail.com  Thu Feb 17 10:59:44 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 10:59:44 -0600
Subject: [petsc-users] custom compiler flags on Windows
In-Reply-To: <AANLkTikWA2EBegw1jhkp0U-3ss0fOR3s4T9W7YoGNUT9@mail.gmail.com>
References: <AANLkTikWA2EBegw1jhkp0U-3ss0fOR3s4T9W7YoGNUT9@mail.gmail.com>
Message-ID: <AANLkTimeCS3=B0CFzs03GVvp0d_MUJogyHmR8BatxvyG@mail.gmail.com>

On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba <dominik at itis.ethz.ch>wrote:

> I need to use some special compile flags when compiling with 'cl' on
> Windows.
> While configuring I currently use --with-cxx='win32fe cl', which works
> fine, but if I add some flags after cl the configure brakes,
> complaining that the compiler does not work.
> I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the
> same as before.
> Is there a way to specify my own flags with Petsc (or add to them)?
>

--COPTFLAGS="<flags>" --FOPTFLAGS="" --CXXOPTFLAGS=""

   Matt


> Best regards,
> Dominik

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/8d612bdd/attachment-0001.htm>

From ecoon at lanl.gov  Thu Feb 17 11:18:27 2011
From: ecoon at lanl.gov (Ethan Coon)
Date: Thu, 17 Feb 2011 10:18:27 -0700
Subject: [petsc-users] general VecScatter from MPI to MPI
In-Reply-To: <AANLkTi=BQQ-7z8PPKb19nsRCvR1qOhnHaE2WEbqduh_r@mail.gmail.com>
References: <1297958784.14179.11.camel@echo.lanl.gov>
	<AANLkTi=BQQ-7z8PPKb19nsRCvR1qOhnHaE2WEbqduh_r@mail.gmail.com>
Message-ID: <1297963107.14179.24.camel@echo.lanl.gov>

On Thu, 2011-02-17 at 10:35 -0600, Matthew Knepley wrote:
> On Thu, Feb 17, 2011 at 10:06 AM, Ethan Coon <ecoon at lanl.gov> wrote:
>         So I thought I understood how VecScatters worked, but
>         apparently not.
>         Is it possible to create a general VecScatter from an
>         arbitrarily
>         partitioned (MPI) Vec to another arbitrarily partitioned (MPI)
>         Vec with
>         the same global sizes (or same global IS sizes) but different
>         local
>         sizes?  Shouldn't this just be a matter of relying upon the
>         implied
>         LocalToGlobalMapping?
> 
> 
> No, the way you have to do this is to map a global Vec to a bunch of
> sequential local Vecs with the sizes you want. This is also how we map
> to overlapping arrays.
> 

So effectively I need two scatters -- a scatter from the global Vec to
the sequential local Vecs, then a scatter (which requires no
communication) to inject the sequential Vecs into the new global Vec?  

Why?  Am I missing something that makes the MPI to MPI scatter ill-posed
as long as the global sizes (but not local sizes) are equal?  

This is mostly curiosity on my part... I think I have to do two scatters
anyway since I'm working with multiple comms -- scatter from an MPI Vec
on one sub-comm into local, sequential Vecs, then scatter those
sequential Vecs into an MPI Vec on PETSC_COMM_WORLD. That's the correct
model for injecting an MPI Vec on one comm into an MPI Vec on
PETSC_COMM_WORLD, correct?

Ethan

> 
>    Matt
>  
>         See below snippet (and its errors):
>         
>         Ethan
>         
>         
>         
>         Vec vA
>         Vec vB
>         VecScatter scatter_AB
>         
>         PetscInt np
>         PetscInt rank
>         PetscErrorCode ierr
>         
>         if (rank.eq.0) np = 3
>         if (rank.eq.1) np = 1
>         
>         call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA,
>         ierr)
>         call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB,
>         ierr)
>         
>         call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB,
>         PETSC_NULL_OBJECT,
>                scatter_AB, ierr)
>         
>         ...
>         
>         $> mpiexec -n 2 ./test
>         
>         [0]PETSC ERROR: --------------------- Error Message
>         ------------------------------------
>         [0]PETSC ERROR: Nonconforming object sizes!
>         [0]PETSC ERROR: Local scatter sizes don't match!
>         [0]PETSC ERROR:
>         ------------------------------------------------------------------------
>         [1]PETSC ERROR: --------------------- Error Message
>         ------------------------------------
>         [1]PETSC ERROR: Nonconforming object sizes!
>         [1]PETSC ERROR: Local scatter sizes don't match!
>         [0]PETSC ERROR: Petsc Development HG revision:
>         5dbe1264252fb9cb5d8e033d620d18f7b0e9111f  HG Date: Fri Feb 11
>         15:44:04
>         2011 -0600
>         [0]PETSC ERROR: See docs/changes/index.html for recent
>         updates.
>         [0]PETSC ERROR: See docs/faq.html for hints about trouble
>         shooting.
>         [0]PETSC ERROR: See docs/index.html for manual pages.
>         [0]PETSC ERROR:
>         ------------------------------------------------------------------------
>         [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu
>         Feb 17
>         08:14:57 2011
>         [0]PETSC ERROR: Libraries linked
>         from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib
>         [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011
>         [0]PETSC ERROR: Configure options --with-debugging=1
>         --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1
>         [0]PETSC ERROR:
>         ------------------------------------------------------------------------
>         [0]PETSC ERROR: VecScatterCreate() line 1432 in
>         src/vec/vec/utils/vscat.c
>         application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
>         [cli_0]: aborting job:
>         application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
>         [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING:
>         Hangup
>         (signal 1)
>         
>         
>         
>         
>         --
>         ------------------------------------
>         Ethan Coon
>         Post-Doctoral Researcher
>         Applied Mathematics - T-5
>         Los Alamos National Laboratory
>         505-665-8289
>         
>         http://www.ldeo.columbia.edu/~ecoon/
>         ------------------------------------
>         
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener

-- 
------------------------------------
Ethan Coon
Post-Doctoral Researcher
Applied Mathematics - T-5
Los Alamos National Laboratory
505-665-8289

http://www.ldeo.columbia.edu/~ecoon/
------------------------------------


From knepley at gmail.com  Thu Feb 17 11:22:46 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 11:22:46 -0600
Subject: [petsc-users] general VecScatter from MPI to MPI
In-Reply-To: <1297963107.14179.24.camel@echo.lanl.gov>
References: <1297958784.14179.11.camel@echo.lanl.gov>
	<AANLkTi=BQQ-7z8PPKb19nsRCvR1qOhnHaE2WEbqduh_r@mail.gmail.com>
	<1297963107.14179.24.camel@echo.lanl.gov>
Message-ID: <AANLkTim9YunObWxfDv4Zc+TssJGJa18mJfEX2LQkBL_u@mail.gmail.com>

On Thu, Feb 17, 2011 at 11:18 AM, Ethan Coon <ecoon at lanl.gov> wrote:

> On Thu, 2011-02-17 at 10:35 -0600, Matthew Knepley wrote:
> > On Thu, Feb 17, 2011 at 10:06 AM, Ethan Coon <ecoon at lanl.gov> wrote:
> >         So I thought I understood how VecScatters worked, but
> >         apparently not.
> >         Is it possible to create a general VecScatter from an
> >         arbitrarily
> >         partitioned (MPI) Vec to another arbitrarily partitioned (MPI)
> >         Vec with
> >         the same global sizes (or same global IS sizes) but different
> >         local
> >         sizes?  Shouldn't this just be a matter of relying upon the
> >         implied
> >         LocalToGlobalMapping?
> >
> >
> > No, the way you have to do this is to map a global Vec to a bunch of
> > sequential local Vecs with the sizes you want. This is also how we map
> > to overlapping arrays.
> >
>
> So effectively I need two scatters -- a scatter from the global Vec to
> the sequential local Vecs, then a scatter (which requires no
> communication) to inject the sequential Vecs into the new global Vec?
>

No, just wrap up the pieces of your global Vec as local Vecs and scatter
straight into that storage using VecCreateSeqWithArray().

   Matt


> Why?  Am I missing something that makes the MPI to MPI scatter ill-posed
> as long as the global sizes (but not local sizes) are equal?
>
> This is mostly curiosity on my part... I think I have to do two scatters
> anyway since I'm working with multiple comms -- scatter from an MPI Vec
> on one sub-comm into local, sequential Vecs, then scatter those
> sequential Vecs into an MPI Vec on PETSC_COMM_WORLD. That's the correct
> model for injecting an MPI Vec on one comm into an MPI Vec on
> PETSC_COMM_WORLD, correct?
>
> Ethan
>
> >
> >    Matt
> >
> >         See below snippet (and its errors):
> >
> >         Ethan
> >
> >
> >
> >         Vec vA
> >         Vec vB
> >         VecScatter scatter_AB
> >
> >         PetscInt np
> >         PetscInt rank
> >         PetscErrorCode ierr
> >
> >         if (rank.eq.0) np = 3
> >         if (rank.eq.1) np = 1
> >
> >         call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA,
> >         ierr)
> >         call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB,
> >         ierr)
> >
> >         call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB,
> >         PETSC_NULL_OBJECT,
> >                scatter_AB, ierr)
> >
> >         ...
> >
> >         $> mpiexec -n 2 ./test
> >
> >         [0]PETSC ERROR: --------------------- Error Message
> >         ------------------------------------
> >         [0]PETSC ERROR: Nonconforming object sizes!
> >         [0]PETSC ERROR: Local scatter sizes don't match!
> >         [0]PETSC ERROR:
> >
> ------------------------------------------------------------------------
> >         [1]PETSC ERROR: --------------------- Error Message
> >         ------------------------------------
> >         [1]PETSC ERROR: Nonconforming object sizes!
> >         [1]PETSC ERROR: Local scatter sizes don't match!
> >         [0]PETSC ERROR: Petsc Development HG revision:
> >         5dbe1264252fb9cb5d8e033d620d18f7b0e9111f  HG Date: Fri Feb 11
> >         15:44:04
> >         2011 -0600
> >         [0]PETSC ERROR: See docs/changes/index.html for recent
> >         updates.
> >         [0]PETSC ERROR: See docs/faq.html for hints about trouble
> >         shooting.
> >         [0]PETSC ERROR: See docs/index.html for manual pages.
> >         [0]PETSC ERROR:
> >
> ------------------------------------------------------------------------
> >         [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu
> >         Feb 17
> >         08:14:57 2011
> >         [0]PETSC ERROR: Libraries linked
> >         from
> /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib
> >         [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011
> >         [0]PETSC ERROR: Configure options --with-debugging=1
> >
> --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared
> --download-mpich=1 --download-ml=1 --download-umfpack=1
> --with-blas-lapack-dir=/usr/lib --download-parmetis=yes
> PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1
> --with-shared-libraries=1 --download-hdf5=1
> >         [0]PETSC ERROR:
> >
> ------------------------------------------------------------------------
> >         [0]PETSC ERROR: VecScatterCreate() line 1432 in
> >         src/vec/vec/utils/vscat.c
> >         application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
> >         [cli_0]: aborting job:
> >         application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
> >         [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING:
> >         Hangup
> >         (signal 1)
> >
> >
> >
> >
> >         --
> >         ------------------------------------
> >         Ethan Coon
> >         Post-Doctoral Researcher
> >         Applied Mathematics - T-5
> >         Los Alamos National Laboratory
> >         505-665-8289
> >
> >         http://www.ldeo.columbia.edu/~ecoon/
> >         ------------------------------------
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
>
> --
> ------------------------------------
> Ethan Coon
> Post-Doctoral Researcher
> Applied Mathematics - T-5
> Los Alamos National Laboratory
> 505-665-8289
>
> http://www.ldeo.columbia.edu/~ecoon/
> ------------------------------------
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/64f4c015/attachment.htm>

From ecoon at lanl.gov  Thu Feb 17 11:31:06 2011
From: ecoon at lanl.gov (Ethan Coon)
Date: Thu, 17 Feb 2011 10:31:06 -0700
Subject: [petsc-users] general VecScatter from MPI to MPI
In-Reply-To: <AANLkTim9YunObWxfDv4Zc+TssJGJa18mJfEX2LQkBL_u@mail.gmail.com>
References: <1297958784.14179.11.camel@echo.lanl.gov>
	<AANLkTi=BQQ-7z8PPKb19nsRCvR1qOhnHaE2WEbqduh_r@mail.gmail.com>
	<1297963107.14179.24.camel@echo.lanl.gov>
	<AANLkTim9YunObWxfDv4Zc+TssJGJa18mJfEX2LQkBL_u@mail.gmail.com>
Message-ID: <1297963866.14179.25.camel@echo.lanl.gov>


>         
>         So effectively I need two scatters -- a scatter from the
>         global Vec to
>         the sequential local Vecs, then a scatter (which requires no
>         communication) to inject the sequential Vecs into the new
>         global Vec?
> 
> 
> No, just wrap up the pieces of your global Vec as local Vecs and
> scatter
> straight into that storage using VecCreateSeqWithArray().
> 

Ah ha!  Thanks,

Ethan

> 
>    Matt
>  
>         Why?  Am I missing something that makes the MPI to MPI scatter
>         ill-posed
>         as long as the global sizes (but not local sizes) are equal?
>         
>         This is mostly curiosity on my part... I think I have to do
>         two scatters
>         anyway since I'm working with multiple comms -- scatter from
>         an MPI Vec
>         on one sub-comm into local, sequential Vecs, then scatter
>         those
>         sequential Vecs into an MPI Vec on PETSC_COMM_WORLD. That's
>         the correct
>         model for injecting an MPI Vec on one comm into an MPI Vec on
>         PETSC_COMM_WORLD, correct?
>         
>         Ethan
>         
>         
>         >
>         >    Matt
>         >
>         >         See below snippet (and its errors):
>         >
>         >         Ethan
>         >
>         >
>         >
>         >         Vec vA
>         >         Vec vB
>         >         VecScatter scatter_AB
>         >
>         >         PetscInt np
>         >         PetscInt rank
>         >         PetscErrorCode ierr
>         >
>         >         if (rank.eq.0) np = 3
>         >         if (rank.eq.1) np = 1
>         >
>         >         call VecCreateMPI(PETSC_COMM_WORLD, 2,
>         PETSC_DETERMINE, vA,
>         >         ierr)
>         >         call VecCreateMPI(PETSC_COMM_WORLD, np,
>         PETSC_DETERMINE, vB,
>         >         ierr)
>         >
>         >         call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB,
>         >         PETSC_NULL_OBJECT,
>         >                scatter_AB, ierr)
>         >
>         >         ...
>         >
>         >         $> mpiexec -n 2 ./test
>         >
>         >         [0]PETSC ERROR: --------------------- Error Message
>         >         ------------------------------------
>         >         [0]PETSC ERROR: Nonconforming object sizes!
>         >         [0]PETSC ERROR: Local scatter sizes don't match!
>         >         [0]PETSC ERROR:
>         >
>         ------------------------------------------------------------------------
>         >         [1]PETSC ERROR: --------------------- Error Message
>         >         ------------------------------------
>         >         [1]PETSC ERROR: Nonconforming object sizes!
>         >         [1]PETSC ERROR: Local scatter sizes don't match!
>         >         [0]PETSC ERROR: Petsc Development HG revision:
>         >         5dbe1264252fb9cb5d8e033d620d18f7b0e9111f  HG Date:
>         Fri Feb 11
>         >         15:44:04
>         >         2011 -0600
>         >         [0]PETSC ERROR: See docs/changes/index.html for
>         recent
>         >         updates.
>         >         [0]PETSC ERROR: See docs/faq.html for hints about
>         trouble
>         >         shooting.
>         >         [0]PETSC ERROR: See docs/index.html for manual
>         pages.
>         >         [0]PETSC ERROR:
>         >
>         ------------------------------------------------------------------------
>         >         [0]PETSC ERROR: ./test on a linux-gnu named tama1 by
>         ecoon Thu
>         >         Feb 17
>         >         08:14:57 2011
>         >         [0]PETSC ERROR: Libraries linked
>         >
>         from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib
>         >         [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14
>         2011
>         >         [0]PETSC ERROR: Configure options --with-debugging=1
>         >
>         --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1
>         >         [0]PETSC ERROR:
>         >
>         ------------------------------------------------------------------------
>         >         [0]PETSC ERROR: VecScatterCreate() line 1432 in
>         >         src/vec/vec/utils/vscat.c
>         >         application called MPI_Abort(MPI_COMM_WORLD, 60) -
>         process 0
>         >         [cli_0]: aborting job:
>         >         application called MPI_Abort(MPI_COMM_WORLD, 60) -
>         process 0
>         >         [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT
>         STRING:
>         >         Hangup
>         >         (signal 1)
>         >
>         >
>         >
>         >
>         >         --
>         >         ------------------------------------
>         >         Ethan Coon
>         >         Post-Doctoral Researcher
>         >         Applied Mathematics - T-5
>         >         Los Alamos National Laboratory
>         >         505-665-8289
>         >
>         >         http://www.ldeo.columbia.edu/~ecoon/
>         >         ------------------------------------
>         >
>         >
>         >
>         >
>         > --
>         > What most experimenters take for granted before they begin
>         their
>         > experiments is infinitely more interesting than any results
>         to which
>         > their experiments lead.
>         > -- Norbert Wiener
>         
>         
>         --
>         
>         ------------------------------------
>         Ethan Coon
>         Post-Doctoral Researcher
>         Applied Mathematics - T-5
>         Los Alamos National Laboratory
>         505-665-8289
>         
>         http://www.ldeo.columbia.edu/~ecoon/
>         ------------------------------------
>         
>         
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener

-- 
------------------------------------
Ethan Coon
Post-Doctoral Researcher
Applied Mathematics - T-5
Los Alamos National Laboratory
505-665-8289

http://www.ldeo.columbia.edu/~ecoon/
------------------------------------


From balay at mcs.anl.gov  Thu Feb 17 12:00:51 2011
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 17 Feb 2011 12:00:51 -0600 (CST)
Subject: [petsc-users] custom compiler flags on Windows
In-Reply-To: <AANLkTimeCS3=B0CFzs03GVvp0d_MUJogyHmR8BatxvyG@mail.gmail.com>
References: <AANLkTikWA2EBegw1jhkp0U-3ss0fOR3s4T9W7YoGNUT9@mail.gmail.com>
	<AANLkTimeCS3=B0CFzs03GVvp0d_MUJogyHmR8BatxvyG@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1102171158250.26665@localhost6.localdomain6>

On Thu, 17 Feb 2011, Matthew Knepley wrote:

> On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba <dominik at itis.ethz.ch>wrote:
> 
> > I need to use some special compile flags when compiling with 'cl' on
> > Windows.
> > While configuring I currently use --with-cxx='win32fe cl', which works
> > fine, but if I add some flags after cl the configure brakes,
> > complaining that the compiler does not work.
> > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the
> > same as before.
> > Is there a way to specify my own flags with Petsc (or add to them)?
> >
> 
> --COPTFLAGS="<flags>" --FOPTFLAGS="" --CXXOPTFLAGS=""

Generally CFLAGS should work. However with MS compilers - we have some
defaults without which the compilers might not work. [esp with mpi].
So when changing CFLAGS one might have to include the defaults plus
the additional flags.

However COPTFLAGS migh be easier to add to CFLAGS - and provided to
primarily specify optimization flags - but can be used for for other
flags aswell..

Satish

From dominik at itis.ethz.ch  Thu Feb 17 12:07:22 2011
From: dominik at itis.ethz.ch (Dominik Szczerba)
Date: Thu, 17 Feb 2011 19:07:22 +0100
Subject: [petsc-users] custom compiler flags on Windows
In-Reply-To: <alpine.LFD.2.02.1102171158250.26665@localhost6.localdomain6>
References: <AANLkTikWA2EBegw1jhkp0U-3ss0fOR3s4T9W7YoGNUT9@mail.gmail.com>
	<AANLkTimeCS3=B0CFzs03GVvp0d_MUJogyHmR8BatxvyG@mail.gmail.com>
	<alpine.LFD.2.02.1102171158250.26665@localhost6.localdomain6>
Message-ID: <AANLkTimjmTZ27aXYgUw1KMRqKU6X9s+vnK2dLBfYhuLP@mail.gmail.com>

Many thanks for your explanations!

On Thu, Feb 17, 2011 at 7:00 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Thu, 17 Feb 2011, Matthew Knepley wrote:
>
>> On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba <dominik at itis.ethz.ch>wrote:
>>
>> > I need to use some special compile flags when compiling with 'cl' on
>> > Windows.
>> > While configuring I currently use --with-cxx='win32fe cl', which works
>> > fine, but if I add some flags after cl the configure brakes,
>> > complaining that the compiler does not work.
>> > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the
>> > same as before.
>> > Is there a way to specify my own flags with Petsc (or add to them)?
>> >
>>
>> --COPTFLAGS="<flags>" --FOPTFLAGS="" --CXXOPTFLAGS=""
>
> Generally CFLAGS should work. However with MS compilers - we have some
> defaults without which the compilers might not work. [esp with mpi].
> So when changing CFLAGS one might have to include the defaults plus
> the additional flags.
>
> However COPTFLAGS migh be easier to add to CFLAGS - and provided to
> primarily specify optimization flags - but can be used for for other
> flags aswell..
>
> Satish
>
>

From mgrabbani at gmail.com  Thu Feb 17 12:22:14 2011
From: mgrabbani at gmail.com (Golam Rabbani)
Date: Thu, 17 Feb 2011 10:22:14 -0800
Subject: [petsc-users] Removing petsc
In-Reply-To: <F1A33004-D2F0-4775-B974-638AF540C28E@mcs.anl.gov>
References: <AANLkTimGOCD9heKrQcRP1dbM5Mz5n5kAGeibNGy1DHFk@mail.gmail.com>
	<F1A33004-D2F0-4775-B974-638AF540C28E@mcs.anl.gov>
Message-ID: <AANLkTi=+-266kQJY-TiaNinZ+Dek3Nj=RCXf8YbXQZqP@mail.gmail.com>

Thanks for reply.

I have not used different PETSC_ARCH; in all the installations it has the
same value of linux-gnu-c-debug, but the different installations are in
completely separate folders and extra packages like lapack, mpi are also
installed separately. Can I delete a folder containing a full installation
and not affect my system?

Thanks
Golam


On Wed, Feb 16, 2011 at 7:20 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>  If you just used different PETSC_ARCH variables like arch-complex arch-opt
> etc  you can just delete those directories and that will remove everything
> related to them.
>
>   Barry
>
> On Feb 16, 2011, at 9:18 PM, Golam Rabbani wrote:
>
> > Hi,
> >
> > I am new to petsc and also new to linux/unix environment.
> > Recently I started converting a matlab nanodevice simulaiton code into a
> c code using petsc and have done most of the converting.
> > But in the process I have installed petsc 4/5 times with different
> configuration options (real, complex, with blopex, etc). Now that I
> > do not need some of them,  I want to uninstall them. Can I simply delete
> the unwanted versions (all the versions are in separate folders)?
> > Please advise me if something different is needed.
> >
> > Thanks in advance,
> > Golam
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/911b7d05/attachment.htm>

From jed at 59A2.org  Thu Feb 17 12:26:20 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 17 Feb 2011 19:26:20 +0100
Subject: [petsc-users] Removing petsc
In-Reply-To: <AANLkTi=+-266kQJY-TiaNinZ+Dek3Nj=RCXf8YbXQZqP@mail.gmail.com>
References: <AANLkTimGOCD9heKrQcRP1dbM5Mz5n5kAGeibNGy1DHFk@mail.gmail.com>
	<F1A33004-D2F0-4775-B974-638AF540C28E@mcs.anl.gov>
	<AANLkTi=+-266kQJY-TiaNinZ+Dek3Nj=RCXf8YbXQZqP@mail.gmail.com>
Message-ID: <AANLkTinrwkor4Wt=uMLCXZXeHDJ46aXthDCcUGOZoPvo@mail.gmail.com>

On Thu, Feb 17, 2011 at 19:22, Golam Rabbani <mgrabbani at gmail.com> wrote:

> Can I delete a folder containing a full installation and not affect my
> system?


Yes.

For future reference, it is easier to stay current if you use different
values of PETSC_ARCH instead of different PETSC_DIR.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/446b19fb/attachment.htm>

From mgrabbani at gmail.com  Thu Feb 17 12:45:47 2011
From: mgrabbani at gmail.com (Golam Rabbani)
Date: Thu, 17 Feb 2011 10:45:47 -0800
Subject: [petsc-users] Removing petsc
In-Reply-To: <AANLkTinrwkor4Wt=uMLCXZXeHDJ46aXthDCcUGOZoPvo@mail.gmail.com>
References: <AANLkTimGOCD9heKrQcRP1dbM5Mz5n5kAGeibNGy1DHFk@mail.gmail.com>
	<F1A33004-D2F0-4775-B974-638AF540C28E@mcs.anl.gov>
	<AANLkTi=+-266kQJY-TiaNinZ+Dek3Nj=RCXf8YbXQZqP@mail.gmail.com>
	<AANLkTinrwkor4Wt=uMLCXZXeHDJ46aXthDCcUGOZoPvo@mail.gmail.com>
Message-ID: <AANLkTimenWEyL7+oFJL54ExVP2_jL8C05okZjGVsaXdC@mail.gmail.com>

Thanks, I will keep your advice in mind.


On Thu, Feb 17, 2011 at 10:26 AM, Jed Brown <jed at 59a2.org> wrote:

> On Thu, Feb 17, 2011 at 19:22, Golam Rabbani <mgrabbani at gmail.com> wrote:
>
>> Can I delete a folder containing a full installation and not affect my
>> system?
>
>
> Yes.
>
> For future reference, it is easier to stay current if you use different
> values of PETSC_ARCH instead of different PETSC_DIR.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/14afec54/attachment.htm>

From juhaj at iki.fi  Thu Feb 17 15:21:43 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Thu, 17 Feb 2011 21:21:43 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <AANLkTi=StLcdPLN6bkB4TekMQPo=KkAhXrSReRTNWy79@mail.gmail.com>
References: <201102161417.09649.juhaj@iki.fi> <201102171408.40609.juhaj@iki.fi>
	<AANLkTi=StLcdPLN6bkB4TekMQPo=KkAhXrSReRTNWy79@mail.gmail.com>
Message-ID: <201102172121.46026.juhaj@iki.fi>

> Yes, if your BC do not give at least a locally unique solution, then your
> Jacobian will
> be rank deficient and Newton breaks down. You can still try Picard, but I
> recommend
> understanding what you mean by a solution first.

Thanks for confirming. And for the suggestion to try Picard, but it simply 
shoots out to somewhere in the direction of Alpha Centauri or some such: 
reaches function values in excess of 1.e+34 in less than ten SNES 
iterations... Perhaps there is such a solution, but that is certainly not what 
I want.

Especially since I think I figured out an alternative boundary condition. But 
I do not know how to implement that in PETSc.

How do I require

f'(xmax) = constant_A
f(xmax) = constant_B

and no condition (I could require f(xmin)=0, but that is exactly the non-
condition I discovered) at xmin?

I did not find any examples of how to do this and it does not seem to be 
straightforward. Do I need to convert from f, f', f'' to f, g, g' with g=f' to 
change the f'(xmax) condition to a Dirichlet one for g? But that does not seem 
to be feasible since I can not think of what equation g (or f') should obey in 
the interior - recall that I just have a single equation, F(f, f', f'') = 0.

Cheers,
-Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/39719ce8/attachment.pgp>

From bsmith at mcs.anl.gov  Thu Feb 17 15:41:57 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 17 Feb 2011 15:41:57 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102172121.46026.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102171408.40609.juhaj@iki.fi>
	<AANLkTi=StLcdPLN6bkB4TekMQPo=KkAhXrSReRTNWy79@mail.gmail.com>
	<201102172121.46026.juhaj@iki.fi>
Message-ID: <9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov>


  On boundary points where you want your mathematical solution x*| at that point  = a you need to use for your coded function f(x) = x -  a. Its derivative is f'(x) = 1 which is nonzero is fine. If the derivative at other points is order K you can use f(x) = K*(x - a)  so the derivate at that point is K.

   Barry

On Feb 17, 2011, at 3:21 PM, Juha J?ykk? wrote:

>> Yes, if your BC do not give at least a locally unique solution, then your
>> Jacobian will
>> be rank deficient and Newton breaks down. You can still try Picard, but I
>> recommend
>> understanding what you mean by a solution first.
> 
> Thanks for confirming. And for the suggestion to try Picard, but it simply 
> shoots out to somewhere in the direction of Alpha Centauri or some such: 
> reaches function values in excess of 1.e+34 in less than ten SNES 
> iterations... Perhaps there is such a solution, but that is certainly not what 
> I want.
> 
> Especially since I think I figured out an alternative boundary condition. But 
> I do not know how to implement that in PETSc.
> 
> How do I require
> 
> f'(xmax) = constant_A
> f(xmax) = constant_B
> 
> and no condition (I could require f(xmin)=0, but that is exactly the non-
> condition I discovered) at xmin?
> 
> I did not find any examples of how to do this and it does not seem to be 
> straightforward. Do I need to convert from f, f', f'' to f, g, g' with g=f' to 
> change the f'(xmax) condition to a Dirichlet one for g? But that does not seem 
> to be feasible since I can not think of what equation g (or f') should obey in 
> the interior - recall that I just have a single equation, F(f, f', f'') = 0.
> 
> Cheers,
> -Juha
> 
> -- 
> 		 -----------------------------------------------
> 		| Juha J?ykk?, juhaj at iki.fi			|
> 		| http://www.maths.leeds.ac.uk/~juhaj		|
> 		 -----------------------------------------------


From bsmith at mcs.anl.gov  Thu Feb 17 15:58:06 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 17 Feb 2011 15:58:06 -0600
Subject: [petsc-users] general VecScatter from MPI to MPI
In-Reply-To: <1297958784.14179.11.camel@echo.lanl.gov>
References: <1297958784.14179.11.camel@echo.lanl.gov>
Message-ID: <5DE8FC2C-CE4A-42CC-AD1A-D86375730DAC@mcs.anl.gov>


  Ethan,

   It is perfectly possible to map from one global MPI vector to another global MPI vector. The vectors can be different or the same sizes and have different or the same layouts. 

   It is just not possible to use DEFAULT is in the from and two positions at the same time. Reason: if you don't provide either it doesn't have a way of generating both that are compatible with each other. I will add an error check for that case.  You should just generate the IS's that you need.

   Ignore Matt's ravings, you don't need to wrapping nothing in no SeqVec to scatter from MPI to MPI. Just provide the ISs.


   Barry


On Feb 17, 2011, at 10:06 AM, Ethan Coon wrote:

> So I thought I understood how VecScatters worked, but apparently not.
> Is it possible to create a general VecScatter from an arbitrarily
> partitioned (MPI) Vec to another arbitrarily partitioned (MPI) Vec with
> the same global sizes (or same global IS sizes) but different local
> sizes?  Shouldn't this just be a matter of relying upon the implied
> LocalToGlobalMapping?
> 
> See below snippet (and its errors):
> 
> Ethan
> 
> 
> 
> Vec vA
> Vec vB
> VecScatter scatter_AB
> 
> PetscInt np
> PetscInt rank
> PetscErrorCode ierr
> 
> if (rank.eq.0) np = 3
> if (rank.eq.1) np = 1
> 
> call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, ierr)
> call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, ierr)
> 
> call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, PETSC_NULL_OBJECT, 
> 	scatter_AB, ierr)
> 
> ...
> 
> $> mpiexec -n 2 ./test
> 
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Nonconforming object sizes!
> [0]PETSC ERROR: Local scatter sizes don't match!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: Nonconforming object sizes!
> [1]PETSC ERROR: Local scatter sizes don't match!
> [0]PETSC ERROR: Petsc Development HG revision:
> 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f  HG Date: Fri Feb 11 15:44:04
> 2011 -0600
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu Feb 17
> 08:14:57 2011
> [0]PETSC ERROR: Libraries linked
> from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib
> [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011
> [0]PETSC ERROR: Configure options --with-debugging=1
> --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: VecScatterCreate() line 1432 in
> src/vec/vec/utils/vscat.c
> application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
> [cli_0]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0
> [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: Hangup
> (signal 1)
> 
> 
> 
> 
> -- 
> ------------------------------------
> Ethan Coon
> Post-Doctoral Researcher
> Applied Mathematics - T-5
> Los Alamos National Laboratory
> 505-665-8289
> 
> http://www.ldeo.columbia.edu/~ecoon/
> ------------------------------------
> 


From juhaj at iki.fi  Thu Feb 17 17:01:07 2011
From: juhaj at iki.fi (Juha =?iso-8859-1?q?J=E4ykk=E4?=)
Date: Thu, 17 Feb 2011 23:01:07 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov>
References: <201102161417.09649.juhaj@iki.fi> <201102172121.46026.juhaj@iki.fi>
	<9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov>
Message-ID: <201102172301.16497.juhaj@iki.fi>

>   On boundary points where you want your mathematical solution x*| at that
> point  = a you need to use for your coded function f(x) = x -  a. Its
> derivative is f'(x) = 1 which is nonzero is fine. If the derivative at
> other points is order K you can use f(x) = K*(x - a)  so the derivate at
> that point is K.

I am not sure, I understood this. Just to make sure there is no confusion with 
the notation, my unknown function be called f and my independent variable x 
and f is defined for 0 <= x <= 1. I use f' for the derivative of f. The 
nonlinear equation I want to solve is F(f,f',f'',x)=0.

So, if I want f(1) = a and f'(1) = b, should I set the F(1) = b*(f-a) in the 
code? Will that not give 0 residual when f(1)=a regardless of it derivative?

Or, alternatively, is my approach totally wrong to begin with? I took a step 
back and started to work with 

r f''/f - r (f'/f)^2 + f'/f = 0

only and cannot get it to converge any more than my actual problem. Now, for 
this I even know the general solution, so it should be easy to solve this for 
f(1)=1, f'(1)=2 (or 1/2, but that has singular derivative at 0, so perhaps it 
is not a good example).

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/2cd60311/attachment.pgp>

From gaurish108 at gmail.com  Thu Feb 17 21:10:44 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Thu, 17 Feb 2011 22:10:44 -0500
Subject: [petsc-users] Least squares: using unpreconditioned LSQR
Message-ID: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>

Hi,

I wanted to solve some least squares problems using PETSc. My test matrix is
size 3x2 but I wish to use this code for solving large ill-conditioned
rectangular systems later.

Looking at the PETSc manual I the found the KSPLSQR routine which implements
the LSQR algorithm.

However I am unsure how to use this routine. I am pasting the lines of the
code which I use to set up the solver.

Through the terminal I pass the option -ksp_type lsqr while running
exectuable.

 ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);

 ierr =
KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);

ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);

 ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);

 ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);

As you can see I have used PETSC_NULL for the preconditioner matrix since I
wish to use the *unpreconditioned* version of the LSQR algorithm. This gives
me an error.

If I pass the matrix A it gives me an error again. I am not sure how to tell
PETSc not to use a preconditioner.

Could you please tell me how I should use KSPSetOperators statement in this
case to use the unpreconditioned algorithm.

If you have a better sparse matrix least squares algorithm implemented
please let me know.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/f0132d46/attachment.htm>

From abhyshr at mcs.anl.gov  Thu Feb 17 21:19:42 2011
From: abhyshr at mcs.anl.gov (Shri)
Date: Thu, 17 Feb 2011 21:19:42 -0600 (CST)
Subject: [petsc-users] Least squares: using unpreconditioned LSQR
In-Reply-To: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>
Message-ID: <1632480610.70640.1297999182044.JavaMail.root@zimbra.anl.gov>

Use the option -pc_type none. 

----- Original Message -----


Hi, 

I wanted to solve some least squares problems using PETSc. My test matrix is size 3x2 but I wish to use this code for solving large ill-conditioned rectangular systems later. 

Looking at the PETSc manual I the found the KSPLSQR routine which implements the LSQR algorithm. 

However I am unsure how to use this routine. I am pasting the lines of the code which I use to set up the solver. 

Through the terminal I pass the option -ksp_type lsqr while running exectuable. 

ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); 

ierr = KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); 

ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); 

ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); 

ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); 

As you can see I have used PETSC_NULL for the preconditioner matrix since I wish to use the *unpreconditioned* version of the LSQR algorithm. This gives me an error. 

If I pass the matrix A it gives me an error again. I am not sure how to tell PETSc not to use a preconditioner. 

Could you pease tell me how I should use KSPSetOperators statement in this case to use the unpreconditioned algorithm. 

If you have a better sparse matrix least squares algorithm implemented please let me know. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/cc7a6d20/attachment.htm>

From knepley at gmail.com  Thu Feb 17 21:20:47 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 17 Feb 2011 21:20:47 -0600
Subject: [petsc-users] Least squares: using unpreconditioned LSQR
In-Reply-To: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>
References: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>
Message-ID: <AANLkTimv7ZFnAJ6-Orc-_LM6UoOKdXi8SmKCM_+ar7ci@mail.gmail.com>

On Thu, Feb 17, 2011 at 9:10 PM, Gaurish Telang <gaurish108 at gmail.com>wrote:

> Hi,
>
> I wanted to solve some least squares problems using PETSc. My test matrix
> is size 3x2 but I wish to use this code for solving large ill-conditioned
> rectangular systems later.
>
> Looking at the PETSc manual I the found the KSPLSQR routine which
> implements the LSQR algorithm.
>
> However I am unsure how to use this routine. I am pasting the lines of the
> code which I use to set up the solver.
>
> Through the terminal I pass the option -ksp_type lsqr while running
> exectuable.
>
>  ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>
>  ierr =
> KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>

Pass A, not PETSC_NULL.

   Matt


> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>
>  ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>
>  ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>
> As you can see I have used PETSC_NULL for the preconditioner matrix since I
> wish to use the *unpreconditioned* version of the LSQR algorithm. This gives
> me an error.
>
> If I pass the matrix A it gives me an error again. I am not sure how to
> tell PETSc not to use a preconditioner.
>
> Could you please tell me how I should use KSPSetOperators statement in this
> case to use the unpreconditioned algorithm.
>
> If you have a better sparse matrix least squares algorithm implemented
> please let me know.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/71d4beaf/attachment.htm>

From hzhang at mcs.anl.gov  Thu Feb 17 22:01:50 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Thu, 17 Feb 2011 22:01:50 -0600
Subject: [petsc-users] Least squares: using unpreconditioned LSQR
In-Reply-To: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>
References: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>
Message-ID: <AANLkTimAdmP4MwKke42_nzGmQBG5fRoUPWQ+sikpGMEO@mail.gmail.com>

You can use '-ksp_type lsqr -pc_type none'

Hong

On Thu, Feb 17, 2011 at 9:10 PM, Gaurish Telang <gaurish108 at gmail.com> wrote:
> Hi,
>
> I wanted to solve some least squares problems using PETSc. My test matrix is
> size 3x2 but I wish to use this code for solving large ill-conditioned
> rectangular systems later.
>
> Looking at the PETSc manual I the found the KSPLSQR routine which implements
> the LSQR algorithm.
>
> However I am unsure how to use this routine. I am pasting the lines of the
> code which I use to set up the solver.
>
> Through the terminal I pass the option -ksp_type lsqr while running
> exectuable.
>
> ?ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>
> ?ierr =
> KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>
> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>
> ?ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>
> ?ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>
> As you can see I have used PETSC_NULL for the preconditioner matrix since I
> wish to use the *unpreconditioned* version of the LSQR algorithm. This gives
> me an error.
>
> If I pass the matrix A it gives me an error again. I am not sure how to tell
> PETSc not to use a preconditioner.
>
> Could you please tell me how I should use KSPSetOperators statement in this
> case to use the unpreconditioned algorithm.
>
> If you have a better sparse matrix least squares algorithm implemented
> please let me know.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

From gaurish108 at gmail.com  Thu Feb 17 22:40:50 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Thu, 17 Feb 2011 23:40:50 -0500
Subject: [petsc-users] Least squares: using unpreconditioned LSQR
In-Reply-To: <AANLkTimv7ZFnAJ6-Orc-_LM6UoOKdXi8SmKCM_+ar7ci@mail.gmail.com>
References: <AANLkTimqsUo49QjW2Qw9qjL=ttkkyGBiYmwCJbtva+xF@mail.gmail.com>
	<AANLkTimv7ZFnAJ6-Orc-_LM6UoOKdXi8SmKCM_+ar7ci@mail.gmail.com>
Message-ID: <AANLkTinG8OiU6LP2U1gyySZtnEUG_DaBfs-JEE0e6KWA@mail.gmail.com>

Thank you. On using A in place of Pmat position in the KSPSetOperators list
of arguments, I was able to get the small test system to work.


On Thu, Feb 17, 2011 at 10:20 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Feb 17, 2011 at 9:10 PM, Gaurish Telang <gaurish108 at gmail.com>wrote:
>
>> Hi,
>>
>> I wanted to solve some least squares problems using PETSc. My test matrix
>> is size 3x2 but I wish to use this code for solving large ill-conditioned
>> rectangular systems later.
>>
>> Looking at the PETSc manual I the found the KSPLSQR routine which
>> implements the LSQR algorithm.
>>
>> However I am unsure how to use this routine. I am pasting the lines of the
>> code which I use to set up the solver.
>>
>> Through the terminal I pass the option -ksp_type lsqr while running
>> exectuable.
>>
>>  ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>
>>  ierr =
>> KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>>
>
> Pass A, not PETSC_NULL.
>
>    Matt
>
>
>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>
>>  ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>
>>  ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>>
>> As you can see I have used PETSC_NULL for the preconditioner matrix since
>> I wish to use the *unpreconditioned* version of the LSQR algorithm. This
>> gives me an error.
>>
>> If I pass the matrix A it gives me an error again. I am not sure how to
>> tell PETSc not to use a preconditioner.
>>
>> Could you please tell me how I should use KSPSetOperators statement in
>> this case to use the unpreconditioned algorithm.
>>
>> If you have a better sparse matrix least squares algorithm implemented
>> please let me know.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110217/0ad214e6/attachment.htm>

From gaurish108 at gmail.com  Fri Feb 18 01:26:18 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Fri, 18 Feb 2011 02:26:18 -0500
Subject: [petsc-users] Running an iterative method for a large number of
 iterations: Possible blow up?
Message-ID: <AANLkTikKkzKjLUpNZmoPfq8iSosz5068JpTbiy6Z2cUx@mail.gmail.com>

Hi,

I was trying to use LSQR algorithm for solving a least squares problem of
size 2683x1274. I notice that if I allow the iterative method to run for a
large number of iterations
after it has converged (i.e. output of -ksp_monitor KSPresidualnorm seems
constant upto the 4th digit) , some numbers in the answer vector seem to get
inordinately large.


I seem to get my answer comparable to Matlab after 951 iterations, but when
I increase the number of iterations to 10000 some numbers seem very large.

Is this expected? Also, how do I terminate my iteration when my residual
norm  seems constant?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/f9b4fa98/attachment-0001.htm>

From gdiso at ustc.edu  Fri Feb 18 02:26:08 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 18 Feb 2011 16:26:08 +0800
Subject: [petsc-users] Is it possible to free extra memory after mat
	assemble?
Message-ID: <59E77951AFD2405782C315F25FF16932@cogendaeda>

Hi,
After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much.

Is it possible to add a function to reassemble the AIJ matrix to free the extra memory?
Or it has already done when MatAssembly is called?


From loic.gouarin at math.u-psud.fr  Fri Feb 18 03:22:48 2011
From: loic.gouarin at math.u-psud.fr (gouarin)
Date: Fri, 18 Feb 2011 10:22:48 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
Message-ID: <4D5E3A68.6020805@math.u-psud.fr>

Hi,

I want to solve 3D Stokes problem with 4Q1/Q1 finite element 
discretization. I have done a parallel version and it works for very 
small size. But it's very slow.

I use DA to create my grid. Here is an example

DACreate3d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_BOX,
                    nv, nv, nv, PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,
                    4,2,PETSC_NULL,PETSC_NULL,PETSC_NULL,&da);

The matrix problem has the form

[ A    B1]
[ B2   0 ]

and the preconditioner is

[ A  0]
[ 0  M]

I use fgmres to solve my system and I want to use MUMPS to solve the 
linear system Ax=b for the preconditioning step.
I want to solve multiple time this problem with different second member.

The first problem is when I call

   Mat A, B;
   DAGetMatrix(da, MATAIJ, &A);
   DAGetMatrix(da, MATAIJ, &B);

It takes a long time in 3D and I don't understand why. I keep the debug 
version of Petsc but i don't think that it is the problem.

After that MUMPS can't do the factorization for nv=33 in my grid 
definition (the case nv=19 works) because there is not enough memory. I 
have the error

[0]PETSC ERROR: Error reported by MUMPS in numerical factorization 
phase: Cannot allocate required memory 1083974153 megabytes

But my problem is not very big for the moment! I want to be able to 
solve Stokes problem on a grid 128x128x128 for the velocity field.

Here is my script to launch the program

mpiexec -np 1 ./stokesPart \
      -stokes_ksp_type fgmres \
      -stokes_ksp_rtol 1e-6 \
      -stokes_pc_type fieldsplit \
      -stokes_pc_fieldsplit_block_size 4 \
      -stokes_pc_fieldsplit_0_fields 0,1,2 \
      -stokes_pc_fieldsplit_1_fields 3 \
      -stokes_fieldsplit_0_ksp_type preonly \
      -stokes_fieldsplit_0_pc_type lu \
      -stokes_fieldsplit_0_pc_factor_mat_solver_package mumps\
      -stokes_fieldsplit_1_ksp_type preonly \
      -stokes_fieldsplit_1_pc_type jacobi \
      -stokes_ksp_monitor_short

I compile Petsc-3.1-p7 with the options

--with-mpi4py=1 --download-mpi4py=yes --with-petsc4py=1 
--download-petsc4py=yes --with-shared --with-dynamic --with-hypre=1 
--download-hypre=yes --with-ml=1 --download-ml=yes --with-mumps=1 
--download-mumps=yes --with-parmetis=1 --download-parmetis=yes 
--with-prometheus=1 --download-prometheus=yes --with-scalapack=1 
--download-scalapack=yes --with-blacs=1 --download-blacs=yes

I think I have to put some MUMPS options but I don't know exactly what.

Could you tell me what I do wrong?

Best Regards,

Loic
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loic_gouarin.vcf
Type: text/x-vcard
Size: 551 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/bd9fc239/attachment.vcf>

From dave.mayhem23 at gmail.com  Fri Feb 18 03:49:09 2011
From: dave.mayhem23 at gmail.com (Dave May)
Date: Fri, 18 Feb 2011 10:49:09 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <4D5E3A68.6020805@math.u-psud.fr>
References: <4D5E3A68.6020805@math.u-psud.fr>
Message-ID: <AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>

Hey Loic,
  I think the problem is clear from the error message.
You simply don't have enough memory to perform the LU decomposition.

>From the info you provide  I can see a numerous places here your code
is using significant amounts of memory (without even considering the
LU factorisation).

1) The DA you create uses a stencil width of 2. This is actually not
required for your element type. Stencil width of one is sufficient.

2) DAGetMatrix is allocating memory of the (2,2) block (the zero
matrix) in your stokes system.

3) The precondioner matrix B created with DAGetMatrix is allocating
memory of the off-diagonal blocks (1,2) and (2,1) which you don't use.

4) Fieldsplit (additive) is copying the matrices associated with the
(1,1) and (2,2) block from the preconditioner.


Representing this complete element type of the DA is not a good idea,
however representing the velocity space and pressure space on
different DA's is fine. Doing this would allow a stencil width of one
to be used for both velocity and pressure - which is that is required.
You can connect the two DA's for velocity and pressure via
DMComposite. Unfortunately, DMComposite cannot create and preallocate
the off-diagonal matrices for you, but it can create and preallocate
memory for the diagonal blocks. You would have to provide the
preallocation routine for the off-diagonal blocks.

I would recommend switching to petsc-dev as there is much more support
for this type of "multi-physics" coupling.

I doubt you will ever be able to solve your 128^3 problem using MUMPS
to factor your (1,1) block. The memory required is simply to great,
you will have to consider using a multilevel preconditioner.

Can you solve your problem using ML or BoomerAMG?


Cheers,
  Dave


On 18 February 2011 10:22, gouarin <loic.gouarin at math.u-psud.fr> wrote:
> Hi,
>
> I want to solve 3D Stokes problem with 4Q1/Q1 finite element discretization.
> I have done a parallel version and it works for very small size. But it's
> very slow.
>
> I use DA to create my grid. Here is an example
>
> DACreate3d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_BOX,
> ? ? ? ? ? ? ? ? ? nv, nv, nv, PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,
> ? ? ? ? ? ? ? ? ? 4,2,PETSC_NULL,PETSC_NULL,PETSC_NULL,&da);
>
> The matrix problem has the form
>
> [ A ? ?B1]
> [ B2 ? 0 ]
>
> and the preconditioner is
>
> [ A ?0]
> [ 0 ?M]
>
> I use fgmres to solve my system and I want to use MUMPS to solve the linear
> system Ax=b for the preconditioning step.
> I want to solve multiple time this problem with different second member.
>
> The first problem is when I call
>
> ?Mat A, B;
> ?DAGetMatrix(da, MATAIJ, &A);
> ?DAGetMatrix(da, MATAIJ, &B);
>
> It takes a long time in 3D and I don't understand why. I keep the debug
> version of Petsc but i don't think that it is the problem.
>
> After that MUMPS can't do the factorization for nv=33 in my grid definition
> (the case nv=19 works) because there is not enough memory. I have the error
>
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> Cannot allocate required memory 1083974153 megabytes
>
> But my problem is not very big for the moment! I want to be able to solve
> Stokes problem on a grid 128x128x128 for the velocity field.
>
> Here is my script to launch the program
>
> mpiexec -np 1 ./stokesPart \
> ? ? -stokes_ksp_type fgmres \
> ? ? -stokes_ksp_rtol 1e-6 \
> ? ? -stokes_pc_type fieldsplit \
> ? ? -stokes_pc_fieldsplit_block_size 4 \
> ? ? -stokes_pc_fieldsplit_0_fields 0,1,2 \
> ? ? -stokes_pc_fieldsplit_1_fields 3 \
> ? ? -stokes_fieldsplit_0_ksp_type preonly \
> ? ? -stokes_fieldsplit_0_pc_type lu \
> ? ? -stokes_fieldsplit_0_pc_factor_mat_solver_package mumps\
> ? ? -stokes_fieldsplit_1_ksp_type preonly \
> ? ? -stokes_fieldsplit_1_pc_type jacobi \
> ? ? -stokes_ksp_monitor_short
>
> I compile Petsc-3.1-p7 with the options
>
> --with-mpi4py=1 --download-mpi4py=yes --with-petsc4py=1
> --download-petsc4py=yes --with-shared --with-dynamic --with-hypre=1
> --download-hypre=yes --with-ml=1 --download-ml=yes --with-mumps=1
> --download-mumps=yes --with-parmetis=1 --download-parmetis=yes
> --with-prometheus=1 --download-prometheus=yes --with-scalapack=1
> --download-scalapack=yes --with-blacs=1 --download-blacs=yes
>
> I think I have to put some MUMPS options but I don't know exactly what.
>
> Could you tell me what I do wrong?
>
> Best Regards,
>
> Loic
>

From loic.gouarin at math.u-psud.fr  Fri Feb 18 04:45:08 2011
From: loic.gouarin at math.u-psud.fr (gouarin)
Date: Fri, 18 Feb 2011 11:45:08 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>
References: <4D5E3A68.6020805@math.u-psud.fr>
	<AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>
Message-ID: <4D5E4DB4.5010706@math.u-psud.fr>

Hi Dave,

thanks for your quick reply.

On 18/02/2011 10:49, Dave May wrote:
> Hey Loic,
>    I think the problem is clear from the error message.
> You simply don't have enough memory to perform the LU decomposition.
>
> > From the info you provide  I can see a numerous places here your code
> is using significant amounts of memory (without even considering the
> LU factorisation).
>
> 1) The DA you create uses a stencil width of 2. This is actually not
> required for your element type. Stencil width of one is sufficient.
>
> 2) DAGetMatrix is allocating memory of the (2,2) block (the zero
> matrix) in your stokes system.
>
> 3) The precondioner matrix B created with DAGetMatrix is allocating
> memory of the off-diagonal blocks (1,2) and (2,1) which you don't use.
>
> 4) Fieldsplit (additive) is copying the matrices associated with the
> (1,1) and (2,2) block from the preconditioner.
>
>
> Representing this complete element type of the DA is not a good idea,
> however representing the velocity space and pressure space on
> different DA's is fine. Doing this would allow a stencil width of one
> to be used for both velocity and pressure - which is that is required.
> You can connect the two DA's for velocity and pressure via
> DMComposite. Unfortunately, DMComposite cannot create and preallocate
> the off-diagonal matrices for you, but it can create and preallocate
> memory for the diagonal blocks. You would have to provide the
> preallocation routine for the off-diagonal blocks.
>
Ok. It's the first time that I use Petsc for a big problem and I don't 
yet see pretty well all the possibilities of Petsc.
When you say: "You would have to provide the preallocation routine for 
the off-diagonal block", do you talk about DASetBlockFills or is it more 
complicated ?
> I would recommend switching to petsc-dev as there is much more support
> for this type of "multi-physics" coupling.
>
I see that DMDA is more advanced in petsc-dev. I'll try it.
> I doubt you will ever be able to solve your 128^3 problem using MUMPS
> to factor your (1,1) block. The memory required is simply to great,
> you will have to consider using a multilevel preconditioner.
>
> Can you solve your problem using ML or BoomerAMG?
>
I tried different solvers. ML doesn't work. I used hypre with this script

mpiexec -np 4 ./stokesPart \
      -stokes_ksp_type minres \
      -stokes_ksp_rtol 1e-6 \
      -stokes_pc_type fieldsplit \
      -stokes_pc_fieldsplit_block_size 4 \
      -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE \
      -stokes_pc_fieldsplit_0_fields 0,1,2 \
      -stokes_pc_fieldsplit_1_fields 3 \
      -stokes_fieldsplit_0_ksp_type richardson \
      -stokes_fieldsplit_0_ksp_max_it 1 \
      -stokes_fieldsplit_0_pc_type hypre \
      -stokes_fieldsplit_0_pc_hypre_type boomeramg\
      -stokes_fieldsplit_0_pc_hypre_boomeramg_max_iter 1 \
      -stokes_fieldsplit_1_ksp_type preonly \
      -stokes_fieldsplit_1_pc_type jacobi \
      -stokes_ksp_monitor_short

but once again, it's very slow or out of memory. Perhaps my options are 
not good ...

The problem is that I don't have to solve Stokes problem just one time 
but multiple time so I have to do this as fast as possible.

Thanks again,
Loic

> Cheers,
>    Dave
>
>
> On 18 February 2011 10:22, gouarin<loic.gouarin at math.u-psud.fr>  wrote:
>> Hi,
>>
>> I want to solve 3D Stokes problem with 4Q1/Q1 finite element discretization.
>> I have done a parallel version and it works for very small size. But it's
>> very slow.
>>
>> I use DA to create my grid. Here is an example
>>
>> DACreate3d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_BOX,
>>                    nv, nv, nv, PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,
>>                    4,2,PETSC_NULL,PETSC_NULL,PETSC_NULL,&da);
>>
>> The matrix problem has the form
>>
>> [ A    B1]
>> [ B2   0 ]
>>
>> and the preconditioner is
>>
>> [ A  0]
>> [ 0  M]
>>
>> I use fgmres to solve my system and I want to use MUMPS to solve the linear
>> system Ax=b for the preconditioning step.
>> I want to solve multiple time this problem with different second member.
>>
>> The first problem is when I call
>>
>>   Mat A, B;
>>   DAGetMatrix(da, MATAIJ,&A);
>>   DAGetMatrix(da, MATAIJ,&B);
>>
>> It takes a long time in 3D and I don't understand why. I keep the debug
>> version of Petsc but i don't think that it is the problem.
>>
>> After that MUMPS can't do the factorization for nv=33 in my grid definition
>> (the case nv=19 works) because there is not enough memory. I have the error
>>
>> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
>> Cannot allocate required memory 1083974153 megabytes
>>
>> But my problem is not very big for the moment! I want to be able to solve
>> Stokes problem on a grid 128x128x128 for the velocity field.
>>
>> Here is my script to launch the program
>>
>> mpiexec -np 1 ./stokesPart \
>>      -stokes_ksp_type fgmres \
>>      -stokes_ksp_rtol 1e-6 \
>>      -stokes_pc_type fieldsplit \
>>      -stokes_pc_fieldsplit_block_size 4 \
>>      -stokes_pc_fieldsplit_0_fields 0,1,2 \
>>      -stokes_pc_fieldsplit_1_fields 3 \
>>      -stokes_fieldsplit_0_ksp_type preonly \
>>      -stokes_fieldsplit_0_pc_type lu \
>>      -stokes_fieldsplit_0_pc_factor_mat_solver_package mumps\
>>      -stokes_fieldsplit_1_ksp_type preonly \
>>      -stokes_fieldsplit_1_pc_type jacobi \
>>      -stokes_ksp_monitor_short
>>
>> I compile Petsc-3.1-p7 with the options
>>
>> --with-mpi4py=1 --download-mpi4py=yes --with-petsc4py=1
>> --download-petsc4py=yes --with-shared --with-dynamic --with-hypre=1
>> --download-hypre=yes --with-ml=1 --download-ml=yes --with-mumps=1
>> --download-mumps=yes --with-parmetis=1 --download-parmetis=yes
>> --with-prometheus=1 --download-prometheus=yes --with-scalapack=1
>> --download-scalapack=yes --with-blacs=1 --download-blacs=yes
>>
>> I think I have to put some MUMPS options but I don't know exactly what.
>>
>> Could you tell me what I do wrong?
>>
>> Best Regards,
>>
>> Loic
>>


-- 
Loic Gouarin
Laboratoire de Math?matiques
Universit? Paris-Sud
B?timent 425
91405 Orsay Cedex
France
Tel: (+33) 1 69 15 60 14
Fax: (+33) 1 69 15 67 18


From dave.mayhem23 at gmail.com  Fri Feb 18 05:07:30 2011
From: dave.mayhem23 at gmail.com (Dave May)
Date: Fri, 18 Feb 2011 12:07:30 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <4D5E4DB4.5010706@math.u-psud.fr>
References: <4D5E3A68.6020805@math.u-psud.fr>
	<AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>
	<4D5E4DB4.5010706@math.u-psud.fr>
Message-ID: <AANLkTim17zycS+sE0DnC1VB175beJU1ukXYHPjY0OVsU@mail.gmail.com>

> Ok. It's the first time that I use Petsc for a big problem and I don't yet
> see pretty well all the possibilities of Petsc.
> When you say: "You would have to provide the preallocation routine for the
> off-diagonal block", do you talk about DASetBlockFills or is it more
> complicated ?

Not exactly. I meant if you use DMComposite, you would have to provide
the preallocation routines for the off-diagonal blocks. If you
continue to use a single to represent u,v,w,p, then I think
DASetBlockFills() would let you control which chunks in the operator
and preconditioner get allocated when you call DAGetMatrix.


> I tried different solvers. ML doesn't work. I used hypre with this script
>
> mpiexec -np 4 ./stokesPart \
> ? ? -stokes_ksp_type minres \
> ? ? -stokes_ksp_rtol 1e-6 \
> ? ? -stokes_pc_type fieldsplit \
> ? ? -stokes_pc_fieldsplit_block_size 4 \
> ? ? -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE \
> ? ? -stokes_pc_fieldsplit_0_fields 0,1,2 \
> ? ? -stokes_pc_fieldsplit_1_fields 3 \
> ? ? -stokes_fieldsplit_0_ksp_type richardson \
> ? ? -stokes_fieldsplit_0_ksp_max_it 1 \
> ? ? -stokes_fieldsplit_0_pc_type hypre \
> ? ? -stokes_fieldsplit_0_pc_hypre_type boomeramg\
> ? ? -stokes_fieldsplit_0_pc_hypre_boomeramg_max_iter 1 \
> ? ? -stokes_fieldsplit_1_ksp_type preonly \
> ? ? -stokes_fieldsplit_1_pc_type jacobi \
> ? ? -stokes_ksp_monitor_short
>
> but once again, it's very slow or out of memory. Perhaps my options are not
> good ...
>

How much memory is used when you use
-stokes_fieldsplit_0_ksp_max_it 1
-stokes_fieldsplit_0_pc_type jacobi
?
It's possible that the copy of the diagonal blocks occurring when you
invoke Fieldsplit just by itself is using all your available memory. I
wouldn't be surprised with a stencil width of 2....

From zonexo at gmail.com  Fri Feb 18 05:20:16 2011
From: zonexo at gmail.com (TAY wee-beng)
Date: Fri, 18 Feb 2011 12:20:16 +0100
Subject: [petsc-users] Re-zero sparse matrix and MatZeroEntries
Message-ID: <4D5E55F0.8000407@gmail.com>

Hi,

I am trying to solve the Navier Stokes momentum equation of a moving body.

For most points (a), I will be using the north/south/east/west locations 
to form the equation.

However, for some points (b), due to the moving body, I will be using 
some interpolation schemes. At different time step, the interpolation 
template will be different for these points. Hence, I will use different 
neighboring points to form the equation.

Moreover, points (a) can change to points (b) and vice versa.

I wonder if I can use MatZeroEntries to re-zero the whole sparse matrix. 
But in the manual, it states that "For sparse matrices this routine 
retains the old nonzero structure. ". However for my case, the template 
is different at different time step.

Hence what is the most efficient procedure?

-- 
Yours sincerely,

TAY wee-beng


From knepley at gmail.com  Fri Feb 18 05:44:59 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 18 Feb 2011 05:44:59 -0600
Subject: [petsc-users] Is it possible to free extra memory after mat
	assemble?
In-Reply-To: <59E77951AFD2405782C315F25FF16932@cogendaeda>
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
Message-ID: <AANLkTikHBZfUF29qMsf4f7kyF8kj0ypDncQXEPt4_+4z@mail.gmail.com>

2011/2/18 Gong Ding <gdiso at ustc.edu>

> Hi,
> After update my FVM code to support higher order, I have to preallocate
> more memory when creating the matrix. However, only a few cells (determined
> at run time) needed to be high order, thus preallocated memory is overkill
> too much.
>
> Is it possible to add a function to reassemble the AIJ matrix to free the
> extra memory?
> Or it has already done when MatAssembly is called?
>

This is done during MatAssemblyEnd(). However, there is no guarantee that
the operating system
actually returns that memory to general use.

   Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/b3285f63/attachment-0001.htm>

From knepley at gmail.com  Fri Feb 18 05:57:11 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 18 Feb 2011 05:57:11 -0600
Subject: [petsc-users] Re-zero sparse matrix and MatZeroEntries
In-Reply-To: <4D5E55F0.8000407@gmail.com>
References: <4D5E55F0.8000407@gmail.com>
Message-ID: <AANLkTikXjRiLdVFTZSZjDqMKjCbb42GNmaQPYAbes65t@mail.gmail.com>

On Fri, Feb 18, 2011 at 5:20 AM, TAY wee-beng <zonexo at gmail.com> wrote:

> Hi,
>
> I am trying to solve the Navier Stokes momentum equation of a moving body.
>
> For most points (a), I will be using the north/south/east/west locations to
> form the equation.
>
> However, for some points (b), due to the moving body, I will be using some
> interpolation schemes. At different time step, the interpolation template
> will be different for these points. Hence, I will use different neighboring
> points to form the equation.
>
> Moreover, points (a) can change to points (b) and vice versa.
>
> I wonder if I can use MatZeroEntries to re-zero the whole sparse matrix.
> But in the manual, it states that "For sparse matrices this routine retains
> the old nonzero structure. ". However for my case, the template is different
> at different time step.
>
> Hence what is the most efficient procedure?
>

There is no efficiently, updateable data structure in PETSc, since this
would be much lower performance for general use.

I suggest using a matrix-free application of your full operator, and a fixed
sparsity operator for your preconditioner. Alternatively,
you can rebuild the matrix structure at each step, which might be the best
option depending on how much work you do in each solve.

   Matt


> --
> Yours sincerely,
>
> TAY wee-beng
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/2516bda6/attachment.htm>

From loic.gouarin at math.u-psud.fr  Fri Feb 18 06:00:09 2011
From: loic.gouarin at math.u-psud.fr (gouarin)
Date: Fri, 18 Feb 2011 13:00:09 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <AANLkTim17zycS+sE0DnC1VB175beJU1ukXYHPjY0OVsU@mail.gmail.com>
References: <4D5E3A68.6020805@math.u-psud.fr>	<AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>	<4D5E4DB4.5010706@math.u-psud.fr>
	<AANLkTim17zycS+sE0DnC1VB175beJU1ukXYHPjY0OVsU@mail.gmail.com>
Message-ID: <4D5E5F49.9090001@math.u-psud.fr>

On 18/02/2011 12:07, Dave May wrote:
>
> How much memory is used when you use
> -stokes_fieldsplit_0_ksp_max_it 1
> -stokes_fieldsplit_0_pc_type jacobi
> ?
> It's possible that the copy of the diagonal blocks occurring when you
> invoke Fieldsplit just by itself is using all your available memory. I
> wouldn't be surprised with a stencil width of 2....
This is the memory info given by the log_summary for nv=19

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage
     Viewer                     1                  0                0    
      0
     Index Set                30                 24            96544     0
     IS L to G Mapping     4                  0                0         0
     Vec                        46                 17           338344    0
     Vec Scatter            12                  0                0         0
     Matrix                    22                 0                0     
     0
     Distributed array     2                  0                0         0
     Preconditioner         3                  0                0         0
     Krylov Solver         3                    0                0         0
========================================================================================================================

and the malloc_info

------------------------------------------
[0] Maximum memory PetscMalloc()ed 184348608 maximum size of entire 
process 225873920
[0] Memory usage sorted by function
[0] 2 3216 ClassPerfLogCreate()
[0] 2 1616 ClassRegLogCreate()
[0] 6 9152 DACreate()
[0] 17 114128 DACreate_3D()
[0] 3 48 DAGetCoordinateDA()
[0] 10 265632 DAGetMatrix3d_MPIAIJ()
[0] 3 48 DASetVertexDivision()
[0] 2 6416 EventPerfLogCreate()
[0] 1 12800 EventPerfLogEnsureSize()
[0] 2 1616 EventRegLogCreate()
[0] 1 3200 EventRegLogRegister()
[0] 12 329376 ISAllGather()
[0] 50 89344 ISCreateBlock()
[0] 25 354768 ISCreateGeneral()
[0] 60 7920 ISCreateStride()
[0] 12 161728 ISGetIndices_Stride()
[0] 2 21888 ISLocalToGlobalMappingBlock()
[0] 2 21888 ISLocalToGlobalMappingCreate()
[0] 12 1728 ISLocalToGlobalMappingCreateNC()
[0] 9 2544 KSPCreate()
[0] 1 16 KSPCreate_MINRES()
[0] 1 16 KSPCreate_Richardson()
[0] 3 48 KSPDefaultConvergedCreate()
[0] 66 41888 MatCreate()
[0] 6 960 MatCreate_MPIAIJ()
[0] 16 5632 MatCreate_SeqAIJ()
[0] 4 12000 MatGetRow_MPIAIJ()
[0] 4 64 MatGetSubMatrices_MPIAIJ()
[0] 160 941760 MatGetSubMatrices_MPIAIJ_Local()
[0] 4 121664 MatGetSubMatrix_MPIAIJ_Private()
[0] 16 304000 MatMarkDiagonal_SeqAIJ()
[0] 80 181061344 MatSeqAIJSetPreallocation_SeqAIJ()
[0] 12 113792 MatSetUpMultiply_MPIAIJ()
[0] 12 288 MatStashCreate_Private()
[0] 50 864 MatStashScatterBegin_Private()
[0] 120 108096 MatZeroRows_MPIAIJ()
[0] 10 182560 Mat_CheckInode()
[0] 9 1776 PCCreate()
[0] 1 144 PCCreate_FieldSplit()
[0] 2 64 PCCreate_Jacobi()
[0] 4 192 PCFieldSplitSetFields_FieldSplit()
[0] 1 16 PCSetFromOptions_FieldSplit()
[0] 5 22864 PCSetUp_FieldSplit()
[0] 4 64 PetscCommDuplicate()
[0] 1 4112 PetscDLLibraryOpen()
[0] 6 24576 PetscDLLibraryRetrieve()
[0] 45 1712 PetscDLLibrarySym()
[0] 579 27792 PetscFListAdd()
[0] 48 2112 PetscGatherMessageLengths()
[0] 52 832 PetscGatherNumberOfMessages()
[0] 90 4320 PetscLayoutCreate()
[0] 64 1392 PetscLayoutSetUp()
[0] 4 64 PetscLogPrintSummary()
[0] 12 384 PetscMaxSum()
[0] 24 6528 PetscOListAdd()
[0] 28 1792 PetscObjectSetState()
[0] 8 192 PetscOptionsGetEList()
[0] 16 4842288 PetscPostIrecvInt()
[0] 12 4842224 PetscPostIrecvScalar()
[0] 0 32 PetscPushSignalHandler()
[0] 1 432 PetscStackCreate()
[0] 1798 54816 PetscStrallocpy()
[0] 30 248832 PetscStrreplace()
[0] 2 45888 PetscTableAdd()
[0] 24 446528 PetscTableCreate()
[0] 3 96 PetscTokenCreate()
[0] 1 16 PetscViewerASCIIMonitorCreate()
[0] 1 16 PetscViewerASCIIOpen()
[0] 3 496 PetscViewerCreate()
[0] 1 64 PetscViewerCreate_ASCII()
[0] 2 528 StackCreate()
[0] 2 1008 StageLogCreate()
[0] 6 14400 User provided function()
[0] 138 58880 VecCreate()
[0] 66 1401952 VecCreate_MPI_Private()
[0] 7 221312 VecCreate_Seq()
[0] 9 288 VecCreate_Seq_Private()
[0] 6 160 VecDuplicateVecs_Default()
[0] 3 3008 VecGetArray3d()
[0] 42 49536 VecScatterCreate()
[0] 16 512 VecScatterCreateCommon_PtoS()
[0] 20 213024 VecScatterCreate_PtoP()
[0] 252 881536 VecScatterCreate_PtoS()
[0] 74 1184 VecStashCreate_Private()


-- 
Loic Gouarin
Laboratoire de Math?matiques
Universit? Paris-Sud
B?timent 425
91405 Orsay Cedex
France
Tel: (+33) 1 69 15 60 14
Fax: (+33) 1 69 15 67 18


From gdiso at ustc.edu  Fri Feb 18 06:04:51 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 18 Feb 2011 20:04:51 +0800
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<AANLkTikHBZfUF29qMsf4f7kyF8kj0ypDncQXEPt4_+4z@mail.gmail.com>
Message-ID: <0DC3F71BF33F4DF7A5C1897C0D21B7C2@cogendaeda>


----- Original Message ----- 
From: "Matthew Knepley" <knepley at gmail.com>
To: "PETSc users list" <petsc-users at mcs.anl.gov>
Sent: Friday, February 18, 2011 7:44 PM
Subject: Re: [petsc-users] Is it possible to free extra memory after matassemble?


> 2011/2/18 Gong Ding <gdiso at ustc.edu>
> 
>> Hi,
>> After update my FVM code to support higher order, I have to preallocate
>> more memory when creating the matrix. However, only a few cells (determined
>> at run time) needed to be high order, thus preallocated memory is overkill
>> too much.
>>
>> Is it possible to add a function to reassemble the AIJ matrix to free the
>> extra memory?
>> Or it has already done when MatAssembly is called?
>>
> 
> This is done during MatAssemblyEnd(). However, there is no guarantee that
> the operating system
> actually returns that memory to general use.
> 
>   Matt

Could you please point out where can I find the MatAssemblyEnd routine for sequence AIJ matrix?
I'd like to take a look at it.
 

From knepley at gmail.com  Fri Feb 18 06:27:22 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 18 Feb 2011 06:27:22 -0600
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
In-Reply-To: <0DC3F71BF33F4DF7A5C1897C0D21B7C2@cogendaeda>
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<AANLkTikHBZfUF29qMsf4f7kyF8kj0ypDncQXEPt4_+4z@mail.gmail.com>
	<0DC3F71BF33F4DF7A5C1897C0D21B7C2@cogendaeda>
Message-ID: <AANLkTinNoAPFru5PS8dLkKRA=F0_GmPV5r2KQwFjmDrk@mail.gmail.com>

On Fri, Feb 18, 2011 at 6:04 AM, Gong Ding <gdiso at ustc.edu> wrote:

>
> ----- Original Message -----
> From: "Matthew Knepley" <knepley at gmail.com>
> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> Sent: Friday, February 18, 2011 7:44 PM
> Subject: Re: [petsc-users] Is it possible to free extra memory after
> matassemble?
>
>
> > 2011/2/18 Gong Ding <gdiso at ustc.edu>
> >
> >> Hi,
> >> After update my FVM code to support higher order, I have to preallocate
> >> more memory when creating the matrix. However, only a few cells
> (determined
> >> at run time) needed to be high order, thus preallocated memory is
> overkill
> >> too much.
> >>
> >> Is it possible to add a function to reassemble the AIJ matrix to free
> the
> >> extra memory?
> >> Or it has already done when MatAssembly is called?
> >>
> >
> > This is done during MatAssemblyEnd(). However, there is no guarantee that
> > the operating system
> > actually returns that memory to general use.
> >
> >   Matt
>
> Could you please point out where can I find the MatAssemblyEnd routine for
> sequence AIJ matrix?
> I'd like to take a look at it.
>


http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/mat/impls/aij/seq/aij.c.html#MatAssemblyEnd_SeqAIJ

   Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/0e98d074/attachment.htm>

From gdiso at ustc.edu  Fri Feb 18 06:29:18 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 18 Feb 2011 20:29:18 +0800
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<AANLkTikHBZfUF29qMsf4f7kyF8kj0ypDncQXEPt4_+4z@mail.gmail.com>
Message-ID: <77C896F9110B4134905B37D0ED5E87FB@cogendaeda>


----- Original Message ----- 
From: "Matthew Knepley" <knepley at gmail.com>
To: "PETSc users list" <petsc-users at mcs.anl.gov>
Sent: Friday, February 18, 2011 7:44 PM
Subject: Re: [petsc-users] Is it possible to free extra memory after matassemble?


> 2011/2/18 Gong Ding <gdiso at ustc.edu>
> 
>> Hi,
>> After update my FVM code to support higher order, I have to preallocate
>> more memory when creating the matrix. However, only a few cells (determined
>> at run time) needed to be high order, thus preallocated memory is overkill
>> too much.
>>
>> Is it possible to add a function to reassemble the AIJ matrix to free the
>> extra memory?
>> Or it has already done when MatAssembly is called?
>>
> 
> This is done during MatAssemblyEnd(). However, there is no guarantee that
> the operating system
> actually returns that memory to general use.
> 
>   Matt

I had checked the function MatAssemblyEnd_SeqAIJ in aij.c.
It seems it only pack the a, i, j array, but didn't free memory.
I guess one should malloc three new array with exact size and copy values to the new one, and then free the old a, i, j array?

  
From knepley at gmail.com  Fri Feb 18 06:57:48 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 18 Feb 2011 06:57:48 -0600
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
In-Reply-To: <77C896F9110B4134905B37D0ED5E87FB@cogendaeda>
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<AANLkTikHBZfUF29qMsf4f7kyF8kj0ypDncQXEPt4_+4z@mail.gmail.com>
	<77C896F9110B4134905B37D0ED5E87FB@cogendaeda>
Message-ID: <AANLkTik7NZbeqq_hWCayzc0EshjsNf4OazexKiumx+AA@mail.gmail.com>

On Fri, Feb 18, 2011 at 6:29 AM, Gong Ding <gdiso at ustc.edu> wrote:

>
> ----- Original Message -----
> From: "Matthew Knepley" <knepley at gmail.com>
> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> Sent: Friday, February 18, 2011 7:44 PM
> Subject: Re: [petsc-users] Is it possible to free extra memory after
> matassemble?
>
>
> > 2011/2/18 Gong Ding <gdiso at ustc.edu>
> >
> >> Hi,
> >> After update my FVM code to support higher order, I have to preallocate
> >> more memory when creating the matrix. However, only a few cells
> (determined
> >> at run time) needed to be high order, thus preallocated memory is
> overkill
> >> too much.
> >>
> >> Is it possible to add a function to reassemble the AIJ matrix to free
> the
> >> extra memory?
> >> Or it has already done when MatAssembly is called?
> >>
> >
> > This is done during MatAssemblyEnd(). However, there is no guarantee that
> > the operating system
> > actually returns that memory to general use.
> >
> >   Matt
>
> I had checked the function MatAssemblyEnd_SeqAIJ in aij.c.
> It seems it only pack the a, i, j array, but didn't free memory.
> I guess one should malloc three new array with exact size and copy values
> to the new one, and then free the old a, i, j array?
>

If you want that, just do MatCopy().

   Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/cdc9ee4c/attachment.htm>

From bsmith at mcs.anl.gov  Fri Feb 18 08:00:12 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 08:00:12 -0600
Subject: [petsc-users] Running an iterative method for a large number of
	iterations: Possible blow up?
In-Reply-To: <AANLkTikKkzKjLUpNZmoPfq8iSosz5068JpTbiy6Z2cUx@mail.gmail.com>
References: <AANLkTikKkzKjLUpNZmoPfq8iSosz5068JpTbiy6Z2cUx@mail.gmail.com>
Message-ID: <FDE1CE08-6BA9-4083-B2FD-76ED69CA48F8@mcs.anl.gov>


   Are you using petsc-dev? If not you should switch to it, it has an additional convergence test based on the residual of the normal equations.

   Barry

   One possible explanation for the large values is that they are in the null space of the operator and though they don't increase the residual norm the solution just accumulates them after a large number of iterations.


On Feb 18, 2011, at 1:26 AM, Gaurish Telang wrote:

> Hi,
> 
> I was trying to use LSQR algorithm for solving a least squares problem of size 2683x1274. I notice that if I allow the iterative method to run for a large number of iterations 
> after it has converged (i.e. output of -ksp_monitor KSPresidualnorm seems constant upto the 4th digit) , some numbers in the answer vector seem to get inordinately large. 
> 
> 
> I seem to get my answer comparable to Matlab after 951 iterations, but when I increase the number of iterations to 10000 some numbers seem very large.  
> 
> Is this expected? Also, how do I terminate my iteration when my residual norm  seems constant? 
> 
> Thanks


From bsmith at mcs.anl.gov  Fri Feb 18 08:04:13 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 08:04:13 -0600
Subject: [petsc-users] Re-zero sparse matrix and MatZeroEntries
In-Reply-To: <AANLkTikXjRiLdVFTZSZjDqMKjCbb42GNmaQPYAbes65t@mail.gmail.com>
References: <4D5E55F0.8000407@gmail.com>
	<AANLkTikXjRiLdVFTZSZjDqMKjCbb42GNmaQPYAbes65t@mail.gmail.com>
Message-ID: <068DC357-A149-4166-8F76-B94F5A93EE8A@mcs.anl.gov>


On Feb 18, 2011, at 5:57 AM, Matthew Knepley wrote:

> On Fri, Feb 18, 2011 at 5:20 AM, TAY wee-beng <zonexo at gmail.com> wrote:
> Hi,
> 
> I am trying to solve the Navier Stokes momentum equation of a moving body.
> 
> For most points (a), I will be using the north/south/east/west locations to form the equation.
> 
> However, for some points (b), due to the moving body, I will be using some interpolation schemes. At different time step, the interpolation template will be different for these points. Hence, I will use different neighboring points to form the equation.
> 
> Moreover, points (a) can change to points (b) and vice versa.
> 
> I wonder if I can use MatZeroEntries to re-zero the whole sparse matrix. But in the manual, it states that "For sparse matrices this routine retains the old nonzero structure. ". However for my case, the template is different at different time step.
> 
> Hence what is the most efficient procedure?
> 
> There is no efficiently, updateable data structure in PETSc, since this would be much lower performance for general use.
> 
> I suggest using a matrix-free application of your full operator, and a fixed sparsity operator for your preconditioner. Alternatively,
> you can rebuild the matrix structure at each step, which might be the best option depending on how much work you do in each solve.

   Here Matt means simply create a new Mat each time the nonzero structure will change and properly preallocate it each time. The additional cost of the new creation is at most a few percent of a run and the code is easier to maintain.

   Barry

> 
>    Matt
>  
> -- 
> Yours sincerely,
> 
> TAY wee-beng
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener


From bsmith at mcs.anl.gov  Fri Feb 18 08:11:38 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 08:11:38 -0600
Subject: [petsc-users] Is it possible to free extra memory after mat
	assemble?
In-Reply-To: <59E77951AFD2405782C315F25FF16932@cogendaeda>
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
Message-ID: <259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov>


On Feb 18, 2011, at 2:26 AM, Gong Ding wrote:

> Hi,
> After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much.

   You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much?

   The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time,  creating a new matrix with the proper preallocation is not an expensive operation.

   Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating.

   Barry


> 
> Is it possible to add a function to reassemble the AIJ matrix to free the extra memory?
> Or it has already done when MatAssembly is called?
> 
> 


From gdiso at ustc.edu  Fri Feb 18 08:20:23 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 18 Feb 2011 22:20:23 +0800
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov>
Message-ID: <AE0E1AF3AAF148B9BDAFFE01D11659DD@cogendaeda>

Hi,
I had added following code to aij.c in the function MatAssemblyEnd_SeqAIJ after pack the matrix elements.
Just allocate three new array and copy data to them.
This skill is heavily used in macro MatSeqXAIJReallocateAIJ
if( a->maxnz != a->nz ) {
  ierr = PetscMalloc3(a->nz,MatScalar,&new_a,a->nz,PetscInt,&new_j,A->rmap->n+1,PetscInt,&new_i);CHKERRQ(ierr);
  ierr = PetscMemcpy(new_a,a->a,a->nz*sizeof(MatScalar));CHKERRQ(ierr);
  ierr = PetscMemcpy(new_i,a->i,(A->rmap->n+1)*sizeof(PetscInt));CHKERRQ(ierr);
  ierr = PetscMemcpy(new_j,a->j,a->nz*sizeof(PetscInt));CHKERRQ(ierr);
  ierr = MatSeqXAIJFreeAIJ(A,&a->a,&a->j,&a->i);CHKERRQ(ierr);
  a->a = new_a;
  a->i = new_i;
  a->j = new_j;
  a->maxnz = a->nz;
}

It seems work well. However, Barry, do you think it has some problem?


On Feb 18, 2011, at 2:26 AM, Gong Ding wrote:

> Hi,
> After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much.

   You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much?

   The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time,  creating a new matrix with the proper preallocation is not an expensive operation.

   Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating.

   Barry


> 
> Is it possible to add a function to reassemble the AIJ matrix to free the extra memory?
> Or it has already done when MatAssembly is called?
> 
> 


From bsmith at mcs.anl.gov  Fri Feb 18 08:23:29 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 08:23:29 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102172301.16497.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi> <201102172121.46026.juhaj@iki.fi>
	<9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov>
	<201102172301.16497.juhaj@iki.fi>
Message-ID: <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov>


  I don't know how to handle the f'(1) = b. I was always taught to first introduce new variables to reduce the problem to a first order equation. For example let g = f'  and the new 
problem is F(f,g,g') = 0 with the additional equations g = f' now there are no second derivatives. 

   Barry


On Feb 17, 2011, at 5:01 PM, Juha J?ykk? wrote:

>>  On boundary points where you want your mathematical solution x*| at that
>> point  = a you need to use for your coded function f(x) = x -  a. Its
>> derivative is f'(x) = 1 which is nonzero is fine. If the derivative at
>> other points is order K you can use f(x) = K*(x - a)  so the derivate at
>> that point is K.
> 
> I am not sure, I understood this. Just to make sure there is no confusion with 
> the notation, my unknown function be called f and my independent variable x 
> and f is defined for 0 <= x <= 1. I use f' for the derivative of f. The 
> nonlinear equation I want to solve is F(f,f',f'',x)=0.
> 
> So, if I want f(1) = a and f'(1) = b, should I set the F(1) = b*(f-a) in the 
> code? Will that not give 0 residual when f(1)=a regardless of it derivative?
> 
> Or, alternatively, is my approach totally wrong to begin with? I took a step 
> back and started to work with 
> 
> r f''/f - r (f'/f)^2 + f'/f = 0
> 
> only and cannot get it to converge any more than my actual problem. Now, for 
> this I even know the general solution, so it should be easy to solve this for 
> f(1)=1, f'(1)=2 (or 1/2, but that has singular derivative at 0, so perhaps it 
> is not a good example).
> 
> Cheers,
> Juha
> 
> -- 
> 		 -----------------------------------------------
> 		| Juha J?ykk?, juhaj at iki.fi			|
> 		| http://www.maths.leeds.ac.uk/~juhaj		|
> 		 -----------------------------------------------


From bsmith at mcs.anl.gov  Fri Feb 18 08:26:23 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 08:26:23 -0600
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
In-Reply-To: <AE0E1AF3AAF148B9BDAFFE01D11659DD@cogendaeda>
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov>
	<AE0E1AF3AAF148B9BDAFFE01D11659DD@cogendaeda>
Message-ID: <C959ECF7-613C-4DB9-8C84-B27C4D80D334@mcs.anl.gov>


On Feb 18, 2011, at 8:20 AM, Gong Ding wrote:

> Hi,
> I had added following code to aij.c in the function MatAssemblyEnd_SeqAIJ after pack the matrix elements.
> Just allocate three new array and copy data to them.
> This skill is heavily used in macro MatSeqXAIJReallocateAIJ
> if( a->maxnz != a->nz ) {
>  ierr = PetscMalloc3(a->nz,MatScalar,&new_a,a->nz,PetscInt,&new_j,A->rmap->n+1,PetscInt,&new_i);CHKERRQ(ierr);
>  ierr = PetscMemcpy(new_a,a->a,a->nz*sizeof(MatScalar));CHKERRQ(ierr);
>  ierr = PetscMemcpy(new_i,a->i,(A->rmap->n+1)*sizeof(PetscInt));CHKERRQ(ierr);
>  ierr = PetscMemcpy(new_j,a->j,a->nz*sizeof(PetscInt));CHKERRQ(ierr);
>  ierr = MatSeqXAIJFreeAIJ(A,&a->a,&a->j,&a->i);CHKERRQ(ierr);
>  a->a = new_a;
>  a->i = new_i;
>  a->j = new_j;
>  a->maxnz = a->nz;
> }
> 
> It seems work well. However, Barry, do you think it has some problem?

  Yes

1) you still have the HUGE grabbing of memory during the initial allocation of the matrix (sometimes with virtual memory this may not hurt you but sometimes it will).

2) processes rarely actually return memory they've malloced back to the underlying OS so the program is still sitting on all that memory (sometimes because of virtual memory this may not hurt you).

Admittedly this is much easier than doing the preallocation correctly so if it works for you then great.

   Barry

> 
> 
> On Feb 18, 2011, at 2:26 AM, Gong Ding wrote:
> 
>> Hi,
>> After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much.
> 
>   You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much?
> 
>   The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time,  creating a new matrix with the proper preallocation is not an expensive operation.
> 
>   Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating.
> 
>   Barry
> 
> 
>> 
>> Is it possible to add a function to reassemble the AIJ matrix to free the extra memory?
>> Or it has already done when MatAssembly is called?
>> 
>> 
> 


From gdiso at ustc.edu  Fri Feb 18 08:31:29 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Fri, 18 Feb 2011 22:31:29 +0800
Subject: [petsc-users] Is it possible to free extra memory after
	matassemble?
References: <59E77951AFD2405782C315F25FF16932@cogendaeda>
	<259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov>
Message-ID: <D69B463C5B124039B315CDE0CFA95CD5@cogendaeda>

Thank Barry, but at the moment I only want to loop cells once.
The "loop" operation is very time consuming because I use AD to build the Jacobian matrix and there are millons of cells.
Anyway, I'd like to allocate enough memory and free extra memory before solving the matrix (usually by direct solver such as MUMPS).


> Hi,
> After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much.

   You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much?

   The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time,  creating a new matrix with the proper preallocation is not an expensive operation.

   Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating.

   Barry


> 
> Is it possible to add a function to reassemble the AIJ matrix to free the extra memory?
> Or it has already done when MatAssembly is called?
> 
> 


From fpacull at fluorem.com  Fri Feb 18 09:22:09 2011
From: fpacull at fluorem.com (francois pacull)
Date: Fri, 18 Feb 2011 16:22:09 +0100
Subject: [petsc-users] ILU, ASM, GMRES and memory
Message-ID: <4D5E8EA1.8090809@fluorem.com>

Dear PETSc team,

I am using a gmres solver along with an additive Schwarz preconditioner 
and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + 
subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a 
preconditioner matrix Pmat that is different from the linear system 
operator matrix Amat.

So, from my understanding, just after the end of the ILU factorization 
(for example, just after a call to PCSetUp(subpc) and before a call to 
KSPSolve(ksp,...)), the rank i process holds in the memory:
1 - the local rows of Amat (ksp&pc's linear system matrix)
2 - the local rows of Pmat (ksp&pc's precond matrix)
3 - the sub-domain preconditioner operator, P[i], which is the local 
diagonal block of Pmat augmented with the overlap (subksp&subpc's 
matrix, linear system matrix = precond matrix)
4 - the incomplete factorization of P[i] (subpc's ILU matrix)

Is it correct?
If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order 
to save some memory space for the Arnoldi vectors?
When Pmat is destroyed with MatDestroy, its corresponding memory space 
seems to be actually freed only when the ksp is destroyed?
 From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, 
but under the constraints that there is no fill-in and the natural 
matrix ordering is used?

Thanks for your help,
francois.


From jdbst21 at gmail.com  Fri Feb 18 09:36:12 2011
From: jdbst21 at gmail.com (Joshua Booth)
Date: Fri, 18 Feb 2011 10:36:12 -0500
Subject: [petsc-users] Mumps
Message-ID: <AANLkTinGDDs=niOP=grchosj=6Y_Q-W+0Y-n_XQ99RL=@mail.gmail.com>

Hello,

I have been have problems using Mumps on a large (O(1M) sparse martrix) on
multiple cores.  I first used the Petsc interface but it would hang
sometime.
Therefore, I wrote it now using MUMPS's C interface.
The code works for one or two cores, but again hangs in the factorization
stage using 4 cores.

Note:  That mumps, BLASC, and Scalpack was installed with petsc using
--download
Compiled using intel 11.1

If anyone has any ideal, I would really appreciate some feedback.

Thank you

Josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/af9a15f0/attachment.htm>

From bsmith at mcs.anl.gov  Fri Feb 18 09:42:29 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 09:42:29 -0600
Subject: [petsc-users] ILU, ASM, GMRES and memory
In-Reply-To: <4D5E8EA1.8090809@fluorem.com>
References: <4D5E8EA1.8090809@fluorem.com>
Message-ID: <DBBC637F-446F-46B1-A102-8DE3D09E2B5B@mcs.anl.gov>


On Feb 18, 2011, at 9:22 AM, francois pacull wrote:

> Dear PETSc team,
> 
> I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat.
> 
> So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory:
> 1 - the local rows of Amat (ksp&pc's linear system matrix)
> 2 - the local rows of Pmat (ksp&pc's precond matrix)
> 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix)
> 4 - the incomplete factorization of P[i] (subpc's ILU matrix)
> 
> Is it correct?
> If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors?

   You will need to "hack" slightly to get the affect. Edit src/ksp/pc/impls/asm/asm.c  and add a new function 

   PCASMFreeSpace(PC pc)
   {
  PC_ASM         *osm = (PC_ASM*)pc->data;
  PetscErrorCode ierr;

     if (osm->pmat) {
    if (osm->n_local_true > 0) {
      ierr = MatDestroyMatrices(osm->n_local_true,&osm->pmat);CHKERRQ(ierr);
    }
  osm->pmat = 0;
  }
  if (pc->pmat) {ierr = MatDestroy(pc->pmat);CHKERRQ(ierr); pc->pmat = 0;}
  return 0;
 }
run make in that directory.

  Now call this routine in your program AFTER calling KSPSetUp() and KSPSetUpOnBlocks() or SNESSetUp() but before KSPSolve() or SNESSolve().

  report any problems to petsc-maint at mcs.anl.gov


> When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed?

   Yes, all PETSc objects are reference counted and the KSP object keeps a reference to Pmat (actually the PC underneath keeps the reference.)

> From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used?

   Under those conditions the space in P[i] is reused for the factor, thus saving the space of the "incomplete factorization of P"

    
> 
> Thanks for your help,
> francois.
> 
> 


From dominik at itis.ethz.ch  Fri Feb 18 09:43:56 2011
From: dominik at itis.ethz.ch (Dominik Szczerba)
Date: Fri, 18 Feb 2011 16:43:56 +0100
Subject: [petsc-users] custom compiler flags on Windows
In-Reply-To: <AANLkTimjmTZ27aXYgUw1KMRqKU6X9s+vnK2dLBfYhuLP@mail.gmail.com>
References: <AANLkTikWA2EBegw1jhkp0U-3ss0fOR3s4T9W7YoGNUT9@mail.gmail.com>
	<AANLkTimeCS3=B0CFzs03GVvp0d_MUJogyHmR8BatxvyG@mail.gmail.com>
	<alpine.LFD.2.02.1102171158250.26665@localhost6.localdomain6>
	<AANLkTimjmTZ27aXYgUw1KMRqKU6X9s+vnK2dLBfYhuLP@mail.gmail.com>
Message-ID: <AANLkTi=3ndo_W3r9kn3=cm0hgtU-jj-FJo9BpLU3Pi_C@mail.gmail.com>

I have manage to smuggle my options along with COPTFLAGS, but not
CFLAGS. The latter seems ignored (cygwin + windows). I tried exporting
them as shell variables as well as attaching them to the command line
before ./configure - in either case they are not to be found in
configure.log. Minor issue (just multiple compiler warnings about
overwritten switches), but if you still have a clean solution I would
be glad to learn it for future.

Many thanks and regards,
Dominik

> On Thu, Feb 17, 2011 at 7:00 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>> On Thu, 17 Feb 2011, Matthew Knepley wrote:
>>
>>> On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba <dominik at itis.ethz.ch>wrote:
>>>
>>> > I need to use some special compile flags when compiling with 'cl' on
>>> > Windows.
>>> > While configuring I currently use --with-cxx='win32fe cl', which works
>>> > fine, but if I add some flags after cl the configure brakes,
>>> > complaining that the compiler does not work.
>>> > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the
>>> > same as before.
>>> > Is there a way to specify my own flags with Petsc (or add to them)?
>>> >
>>>
>>> --COPTFLAGS="<flags>" --FOPTFLAGS="" --CXXOPTFLAGS=""
>>
>> Generally CFLAGS should work. However with MS compilers - we have some
>> defaults without which the compilers might not work. [esp with mpi].
>> So when changing CFLAGS one might have to include the defaults plus
>> the additional flags.
>>
>> However COPTFLAGS migh be easier to add to CFLAGS - and provided to
>> primarily specify optimization flags - but can be used for for other
>> flags aswell..
>>
>> Satish
>>
>>
>

From knepley at gmail.com  Fri Feb 18 09:58:21 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 18 Feb 2011 09:58:21 -0600
Subject: [petsc-users] Mumps
In-Reply-To: <AANLkTinGDDs=niOP=grchosj=6Y_Q-W+0Y-n_XQ99RL=@mail.gmail.com>
References: <AANLkTinGDDs=niOP=grchosj=6Y_Q-W+0Y-n_XQ99RL=@mail.gmail.com>
Message-ID: <AANLkTinQYFrF-0ybj8y0q9pmkaK4=F=Qbjjo-XdUVLA4@mail.gmail.com>

On Fri, Feb 18, 2011 at 9:36 AM, Joshua Booth <jdbst21 at gmail.com> wrote:

> Hello,
>
> I have been have problems using Mumps on a large (O(1M) sparse martrix) on
> multiple cores.  I first used the Petsc interface but it would hang
> sometime.
> Therefore, I wrote it now using MUMPS's C interface.
> The code works for one or two cores, but again hangs in the factorization
> stage using 4 cores.
>

Have you verified that this is actually hanging (in PETSc and MUMPS) and not
just slow? These factorizations
can have a large amount of fill and thus can get very slow and use a lot of
memory for matrices this large.

   Matt


> Note:  That mumps, BLASC, and Scalpack was installed with petsc using
> --download
> Compiled using intel 11.1
>
> If anyone has any ideal, I would really appreciate some feedback.
>
> Thank you
>
> Josh
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/8a74b657/attachment.htm>

From hzhang at mcs.anl.gov  Fri Feb 18 10:00:22 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Fri, 18 Feb 2011 10:00:22 -0600
Subject: [petsc-users] Mumps
In-Reply-To: <AANLkTinGDDs=niOP=grchosj=6Y_Q-W+0Y-n_XQ99RL=@mail.gmail.com>
References: <AANLkTinGDDs=niOP=grchosj=6Y_Q-W+0Y-n_XQ99RL=@mail.gmail.com>
Message-ID: <AANLkTimgsEajzZyGKDcSaWDzr1bxTtZ7cQmxYhuJHXGM@mail.gmail.com>

 Joshua :
>
> I have been have problems using Mumps on a large (O(1M) sparse martrix) on
> multiple cores.? I first used the Petsc interface but it would hang
> sometime.

"Petsc interface"?
Do you mean Petsc LU sequential solver or petsc-mumps interface?

> Therefore, I wrote it now using MUMPS's C interface.
> The code works for one or two cores, but again hangs in the factorization
> stage using 4 cores.

Do you mean using MUMPS's C interface without petsc?
>
> Note:? That mumps, BLASC, and Scalpack was installed with petsc using
> --download
> Compiled using intel 11.1
>
> If anyone has any ideal, I would really appreciate some feedback.
>

You need figure out where it hangs. Does it hang on smaller size problems?
I rarely see hang from MUMPS.
Try increasing fill ratio
-mat_mumps_icntl_14 <20>: ICNTL(14): percentage of estimated workspace
increase (None)

Hong

Hong

> Thank you
>
> Josh
>

From bsmith at mcs.anl.gov  Fri Feb 18 10:21:44 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 10:21:44 -0600
Subject: [petsc-users] ILU, ASM, GMRES and memory
In-Reply-To: <DBBC637F-446F-46B1-A102-8DE3D09E2B5B@mcs.anl.gov>
References: <4D5E8EA1.8090809@fluorem.com>
	<DBBC637F-446F-46B1-A102-8DE3D09E2B5B@mcs.anl.gov>
Message-ID: <633AC404-2940-4F82-8A18-2E5195D68F29@mcs.anl.gov>


  You will still need to delete your reference to the pmat by calling MatDestroy(pmat); after you have called KSPSetOperators().

   Barry

On Feb 18, 2011, at 9:42 AM, Barry Smith wrote:

> 
> On Feb 18, 2011, at 9:22 AM, francois pacull wrote:
> 
>> Dear PETSc team,
>> 
>> I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat.
>> 
>> So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory:
>> 1 - the local rows of Amat (ksp&pc's linear system matrix)
>> 2 - the local rows of Pmat (ksp&pc's precond matrix)
>> 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix)
>> 4 - the incomplete factorization of P[i] (subpc's ILU matrix)
>> 
>> Is it correct?
>> If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors?
> 
>   You will need to "hack" slightly to get the affect. Edit src/ksp/pc/impls/asm/asm.c  and add a new function 
> 
>   PCASMFreeSpace(PC pc)
>   {
>  PC_ASM         *osm = (PC_ASM*)pc->data;
>  PetscErrorCode ierr;
> 
>     if (osm->pmat) {
>    if (osm->n_local_true > 0) {
>      ierr = MatDestroyMatrices(osm->n_local_true,&osm->pmat);CHKERRQ(ierr);
>    }
>  osm->pmat = 0;
>  }
>  if (pc->pmat) {ierr = MatDestroy(pc->pmat);CHKERRQ(ierr); pc->pmat = 0;}
>  return 0;
> }
> run make in that directory.
> 
>  Now call this routine in your program AFTER calling KSPSetUp() and KSPSetUpOnBlocks() or SNESSetUp() but before KSPSolve() or SNESSolve().
> 
>  report any problems to petsc-maint at mcs.anl.gov
> 
> 
>> When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed?
> 
>   Yes, all PETSc objects are reference counted and the KSP object keeps a reference to Pmat (actually the PC underneath keeps the reference.)
> 
>> From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used?
> 
>   Under those conditions the space in P[i] is reused for the factor, thus saving the space of the "incomplete factorization of P"
> 
> 
>> 
>> Thanks for your help,
>> francois.
>> 
>> 
> 


From gaurish108 at gmail.com  Fri Feb 18 11:04:21 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Fri, 18 Feb 2011 12:04:21 -0500
Subject: [petsc-users] value of PETSC_ARCH
Message-ID: <AANLkTinPX95f7sZDVA_8AW5-J3o0GvYK15MiTGMj8OYB@mail.gmail.com>

Hi to install PETSc without the debugging version, what value of PETSC_ARCH
should I give? Or is this automatically decided by PETSc  during the
configure sterp?

 I know that during configure I should pass --with-debugging=no
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/0ec13793/attachment.htm>

From sean at mcs.anl.gov  Fri Feb 18 11:09:44 2011
From: sean at mcs.anl.gov (Sean Farley)
Date: Fri, 18 Feb 2011 11:09:44 -0600
Subject: [petsc-users] value of PETSC_ARCH
In-Reply-To: <AANLkTinPX95f7sZDVA_8AW5-J3o0GvYK15MiTGMj8OYB@mail.gmail.com>
References: <AANLkTinPX95f7sZDVA_8AW5-J3o0GvYK15MiTGMj8OYB@mail.gmail.com>
Message-ID: <AANLkTimq+zft80Hc33EVQnhgJkBkNNspwwsnZ153yVzG@mail.gmail.com>

>
> Hi to install PETSc without the debugging version, what value of PETSC_ARCH
> should I give? Or is this automatically decided by PETSc  during the
> configure sterp?


Anything you like to distinguish the different arches. PETSC_ARCH is just a
unique label so that you can install multiple versions of PETSc without
duplicating the entire source tree.

For example,

$ ls $PETSC_DIR
...
darwin10.5.0-cxx-debug
darwin10.5.0-sieve-debug
darwin10.5.0-cxx-intel
darwin10.5.0-sieve-intel
...

are four different installs (arches) I have with the name of each PETSC_ARCH
reminding me what the difference is between them.

Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/7cddaef7/attachment.htm>

From fpacull at fluorem.com  Fri Feb 18 12:00:44 2011
From: fpacull at fluorem.com (francois pacull)
Date: Fri, 18 Feb 2011 19:00:44 +0100
Subject: [petsc-users] ILU, ASM, GMRES and memory
In-Reply-To: <633AC404-2940-4F82-8A18-2E5195D68F29@mcs.anl.gov>
References: <4D5E8EA1.8090809@fluorem.com>	<DBBC637F-446F-46B1-A102-8DE3D09E2B5B@mcs.anl.gov>
	<633AC404-2940-4F82-8A18-2E5195D68F29@mcs.anl.gov>
Message-ID: <4D5EB3CC.2090106@fluorem.com>

Thanks a lot Barry,
I did include the new function PCASMFreeSpace to asm.c and compile it... 
I will measure the effect on memory next week.
Regards,
francois.


Barry Smith wrote:
>   You will still need to delete your reference to the pmat by calling MatDestroy(pmat); after you have called KSPSetOperators().
>
>    Barry
>
> On Feb 18, 2011, at 9:42 AM, Barry Smith wrote:
>
>   
>> On Feb 18, 2011, at 9:22 AM, francois pacull wrote:
>>
>>     
>>> Dear PETSc team,
>>>
>>> I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat.
>>>
>>> So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory:
>>> 1 - the local rows of Amat (ksp&pc's linear system matrix)
>>> 2 - the local rows of Pmat (ksp&pc's precond matrix)
>>> 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix)
>>> 4 - the incomplete factorization of P[i] (subpc's ILU matrix)
>>>
>>> Is it correct?
>>> If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors?
>>>       
>>   You will need to "hack" slightly to get the affect. Edit src/ksp/pc/impls/asm/asm.c  and add a new function 
>>
>>   PCASMFreeSpace(PC pc)
>>   {
>>  PC_ASM         *osm = (PC_ASM*)pc->data;
>>  PetscErrorCode ierr;
>>
>>     if (osm->pmat) {
>>    if (osm->n_local_true > 0) {
>>      ierr = MatDestroyMatrices(osm->n_local_true,&osm->pmat);CHKERRQ(ierr);
>>    }
>>  osm->pmat = 0;
>>  }
>>  if (pc->pmat) {ierr = MatDestroy(pc->pmat);CHKERRQ(ierr); pc->pmat = 0;}
>>  return 0;
>> }
>> run make in that directory.
>>
>>  Now call this routine in your program AFTER calling KSPSetUp() and KSPSetUpOnBlocks() or SNESSetUp() but before KSPSolve() or SNESSolve().
>>
>>  report any problems to petsc-maint at mcs.anl.gov
>>
>>
>>     
>>> When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed?
>>>       
>>   Yes, all PETSc objects are reference counted and the KSP object keeps a reference to Pmat (actually the PC underneath keeps the reference.)
>>
>>     
>>> From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used?
>>>       
>>   Under those conditions the space in P[i] is reused for the factor, thus saving the space of the "incomplete factorization of P"
>>
>>
>>     
>>> Thanks for your help,
>>> francois.
>>>
>>>
>>>       
>
>
>   


From gaurish108 at gmail.com  Fri Feb 18 12:11:22 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Fri, 18 Feb 2011 13:11:22 -0500
Subject: [petsc-users] Strange behavior of -log_summary and final answer
Message-ID: <AANLkTin7FT7Xfgok2vZecLORjORk=TTMdnwA0JZtJBur@mail.gmail.com>

Hi I am using petsc-3.1-p7.

My code seems to be behaving strangely during the execution step.

I am solving a simple least squares problem with the LSQR routine. The
options that I am using at the terminal are.:

mpiexec -n 1 ./rect_input -f A2 -vector b2 -ksp_type lsqr -pc_type none
-log_summary -ksp_max_it 1038

Even though I have used the -log_summary flag the performance summary does
*not* get displayed on some occasions. The answer also is different from the
answer I expect (which I obtain from matlab).

But after running the executable twice or thrice, I get the performance
summary along with the correct answer. Why is this happening?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110218/08429ed2/attachment.htm>

From bsmith at mcs.anl.gov  Fri Feb 18 12:30:47 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 18 Feb 2011 12:30:47 -0600
Subject: [petsc-users] Strange behavior of -log_summary and final answer
In-Reply-To: <AANLkTin7FT7Xfgok2vZecLORjORk=TTMdnwA0JZtJBur@mail.gmail.com>
References: <AANLkTin7FT7Xfgok2vZecLORjORk=TTMdnwA0JZtJBur@mail.gmail.com>
Message-ID: <B9A225E6-F9B6-413E-9746-FA69291F041C@mcs.anl.gov>


  Run with valgrind http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind

  Barry


On Feb 18, 2011, at 12:11 PM, Gaurish Telang wrote:

> Hi I am using petsc-3.1-p7. 
> 
> My code seems to be behaving strangely during the execution step. 
> 
> I am solving a simple least squares problem with the LSQR routine. The options that I am using at the terminal are.:
> 
> mpiexec -n 1 ./rect_input -f A2 -vector b2 -ksp_type lsqr -pc_type none -log_summary -ksp_max_it 1038
> 
> Even though I have used the -log_summary flag the performance summary does *not* get displayed on some occasions. The answer also is different from the answer I expect (which I obtain from matlab). 
> 
> But after running the executable twice or thrice, I get the performance summary along with the correct answer. Why is this happening? 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


From gdiso at ustc.edu  Sat Feb 19 02:25:55 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Sat, 19 Feb 2011 16:25:55 +0800
Subject: [petsc-users] more flexible MatSetValues?
Message-ID: <22BBE128ED964DC494E432F6322EE85E@cogendaeda>

Hi,
After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct.

Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation?
And flush the buffer to real a,i,j array when MatAssemblyEnd is called?

Gong Ding

From jed at 59A2.org  Sat Feb 19 03:21:09 2011
From: jed at 59A2.org (Jed Brown)
Date: Sat, 19 Feb 2011 10:21:09 +0100
Subject: [petsc-users] more flexible MatSetValues?
In-Reply-To: <AANLkTindrF=5n4pXDBU4S+e7hEk5c1huy8oX-dowaWrX@mail.gmail.com>
References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda>
	<AANLkTi=OfmWAdVwxXgtisLH+ERrbBb3a==0zjO4DVunz@mail.gmail.com>
	<AANLkTindrF=5n4pXDBU4S+e7hEk5c1huy8oX-dowaWrX@mail.gmail.com>
Message-ID: <AANLkTikLWoSKfjKPGGvjkUs9cHXdsUPLTSnrkgAUNWj=@mail.gmail.com>

Values in each location are often set many times. Once per element in FEM,
so about 20 times for P1 tets. That uses a lot more memory and you need to
sort that beast to count correctly. Using a separate dynamic data structure
for each row would be a lot more mallocs, but you could keep the rows sorted
and avoid storing 20 copies, however insertion is still expensive. A heap is
nice for insertion, but not for searching.

So dynamic data structures could help, but they still cost quite a bit. The
preallocation problem is trivial for finite difference methods so any useful
solution needs to handle many insertions to the same location.

On Feb 19, 2011 9:35 AM, "Gong Ding" <gdiso at ustc.edu> wrote:

Hi,
After reading the source code of aij.c, I think the MatSetValues function
can be more flexible when preallocation is not correct.

Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer
the operation?
And flush the buffer to real a,i,j array when MatAssemblyEnd is called?

Gong Ding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110219/c65a1fb2/attachment.htm>

From juhaj at iki.fi  Sat Feb 19 04:19:35 2011
From: juhaj at iki.fi (Juha =?iso-8859-1?q?J=E4ykk=E4?=)
Date: Sat, 19 Feb 2011 10:19:35 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov>
References: <201102161417.09649.juhaj@iki.fi> <201102172301.16497.juhaj@iki.fi>
	<4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov>
Message-ID: <201102191019.42133.juhaj@iki.fi>

>   I don't know how to handle the f'(1) = b. I was always taught to first
> introduce new variables to reduce the problem to a first order equation.
> For example let g = f'  and the new problem is F(f,g,g') = 0 with the
> additional equations g = f' now there are no second derivatives.

Yes, that's always an option and for time stepping, for instance, that is 
probably always the best way to go, but I did not think that would be 
necessary for a simple second order ODE - albeit a non-linear one.

Let me see what happens if I do that...

Cheers,
-Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110219/70367e88/attachment.pgp>

From gdiso at ustc.edu  Sat Feb 19 07:43:47 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Sat, 19 Feb 2011 21:43:47 +0800 (CST)
Subject: [petsc-users] more flexible MatSetValues?
In-Reply-To: <AANLkTikLWoSKfjKPGGvjkUs9cHXdsUPLTSnrkgAUNWj=@mail.gmail.com>
References: <AANLkTikLWoSKfjKPGGvjkUs9cHXdsUPLTSnrkgAUNWj=@mail.gmail.com>
	<22BBE128ED964DC494E432F6322EE85E@cogendaeda>
	<AANLkTi=OfmWAdVwxXgtisLH+ERrbBb3a==0zjO4DVunz@mail.gmail.com>
	<AANLkTindrF=5n4pXDBU4S+e7hEk5c1huy8oX-dowaWrX@mail.gmail.com>
Message-ID: <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu>

So dynamic array is suitable for acting as a plus to the preallocation.
Only a few extra matrix entries not considered in the preallocation are needed to be processed.
For example, an integral boundary condition with dynamic integration range may have different nonzero entry in a row,
which can be hold by the dynamic array.
 
Values in each location are often set many times. Once per element in FEM, so about 20 times for P1 tets. That uses a lot more memory and you need to sort that beast to count correctly. Using a separate dynamic data structure for each row would be a lot more mallocs, but you could keep the rows sorted and avoid storing 20 copies, however insertion is still expensive. A heap is nice for insertion, but not for searching.
So dynamic data structures could help, but they still cost quite a bit. The preallocation problem is trivial for finite difference methods so any useful solution needs to handle many insertions to the same location.
On Feb 19, 2011 9:35 AM, "Gong Ding" <gdiso at ustc.edu> wrote:
Hi,
 After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct.
 Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation?
 And flush the buffer to real a,i,j array when MatAssemblyEnd is called?
 Gong Ding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110219/91132aa2/attachment.htm>

From bsmith at mcs.anl.gov  Sat Feb 19 08:27:51 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 19 Feb 2011 08:27:51 -0600
Subject: [petsc-users] more flexible MatSetValues?
In-Reply-To: <22BBE128ED964DC494E432F6322EE85E@cogendaeda>
References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda>
Message-ID: <097C0B3B-A792-42B4-83CD-C2EF1F642BA8@mcs.anl.gov>


  Because PETSc is designed as an object oriented library with class inheritance one could derive a new subclass with dynamic allocation from the current class without needing to write much new code.

   Barry

On Feb 19, 2011, at 2:25 AM, Gong Ding wrote:

> Hi,
> After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct.
> 
> Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation?
> And flush the buffer to real a,i,j array when MatAssemblyEnd is called?
> 
> Gong Ding


From jed at 59A2.org  Sat Feb 19 08:45:02 2011
From: jed at 59A2.org (Jed Brown)
Date: Sat, 19 Feb 2011 15:45:02 +0100
Subject: [petsc-users] more flexible MatSetValues?
In-Reply-To: <AANLkTi=TjDzm3mPhnV6ifySfCva_YCVodN9AuCebep2+@mail.gmail.com>
References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda>
	<AANLkTi=OfmWAdVwxXgtisLH+ERrbBb3a==0zjO4DVunz@mail.gmail.com>
	<AANLkTindrF=5n4pXDBU4S+e7hEk5c1huy8oX-dowaWrX@mail.gmail.com>
	<AANLkTikLWoSKfjKPGGvjkUs9cHXdsUPLTSnrkgAUNWj=@mail.gmail.com>
	<4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu>
	<AANLkTi=TjDzm3mPhnV6ifySfCva_YCVodN9AuCebep2+@mail.gmail.com>
Message-ID: <AANLkTinYR775ZYL11_n1A8T7qq3Zko174bg+d32MkFGa@mail.gmail.com>

On Sat, Feb 19, 2011 at 14:43, Gong Ding <gdiso at ustc.edu> wrote:

an integral boundary condition with dynamic integration range

 Isn't there a maximum possible stencil for that boundary node? Why not
preallocate for all of those and keep typical sparsity for the rest of the
matrix?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110219/97920832/attachment-0001.htm>

From gdiso at ustc.edu  Sat Feb 19 10:22:55 2011
From: gdiso at ustc.edu (Gong Ding)
Date: Sun, 20 Feb 2011 00:22:55 +0800 (CST)
Subject: [petsc-users] more flexible MatSetValues?
In-Reply-To: <AANLkTinYR775ZYL11_n1A8T7qq3Zko174bg+d32MkFGa@mail.gmail.com>
References: <AANLkTinYR775ZYL11_n1A8T7qq3Zko174bg+d32MkFGa@mail.gmail.com>
	<22BBE128ED964DC494E432F6322EE85E@cogendaeda>
	<AANLkTi=OfmWAdVwxXgtisLH+ERrbBb3a==0zjO4DVunz@mail.gmail.com>
	<AANLkTindrF=5n4pXDBU4S+e7hEk5c1huy8oX-dowaWrX@mail.gmail.com>
	<AANLkTikLWoSKfjKPGGvjkUs9cHXdsUPLTSnrkgAUNWj=@mail.gmail.com>
	<4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu>
	<AANLkTi=TjDzm3mPhnV6ifySfCva_YCVodN9AuCebep2+@mail.gmail.com>
Message-ID: <27526938.31101298132575330.JavaMail.coremail@mail.ustc.edu>

Some nonlocal phenomenon such as band to band tunneling in semiconductor.
an integral boundary condition with dynamic integration range
Isn't there a maximum possible stencil for that boundary node? Why not preallocate for all of those and keep typical sparsity for the rest of the matrix?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110220/ea386361/attachment.htm>

From knepley at gmail.com  Sat Feb 19 10:24:09 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 19 Feb 2011 10:24:09 -0600
Subject: [petsc-users] more flexible MatSetValues?
In-Reply-To: <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu>
References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda>
	<AANLkTi=OfmWAdVwxXgtisLH+ERrbBb3a==0zjO4DVunz@mail.gmail.com>
	<AANLkTindrF=5n4pXDBU4S+e7hEk5c1huy8oX-dowaWrX@mail.gmail.com>
	<AANLkTikLWoSKfjKPGGvjkUs9cHXdsUPLTSnrkgAUNWj=@mail.gmail.com>
	<4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu>
Message-ID: <AANLkTik8=w+ax-BSm69ujwei5GhBScD+n=2ivJjWH8do@mail.gmail.com>

On Sat, Feb 19, 2011 at 7:43 AM, Gong Ding <gdiso at ustc.edu> wrote:

> So dynamic array is suitable for acting as a plus to the preallocation.
> Only a few extra matrix entries not considered in the preallocation are
> needed to be processed.
> For example, an integral boundary condition with dynamic integration range
> may have different nonzero entry in a row,
> which can be hold by the dynamic array.
>

I did benchmark of the STL dynamic data structures, and the memory overhead
is quite large. The problem
here is not necessarily what could be done, but what community expectations
are. People are not going to
ditch their old Fortran code for something that allocations 3-4 times the
memory.

As Barry points out, you could easily make a new subclass.

   Matt


>
>
>
> Values in each location are often set many times. Once per element in FEM,
> so about 20 times for P1 tets. That uses a lot more memory and you need to
> sort that beast to count correctly. Using a separate dynamic data structure
> for each row would be a lot more mallocs, but you could keep the rows sorted
> and avoid storing 20 copies, however insertion is still expensive. A heap is
> nice for insertion, but not for searching.
>
> So dynamic data structures could help, but they still cost quite a bit. The
> preallocation problem is trivial for finite difference methods so any useful
> solution needs to handle many insertions to the same location.
>
> On Feb 19, 2011 9:35 AM, "Gong Ding" <gdiso at ustc.edu> wrote:
>
> Hi,
> After reading the source code of aij.c, I think the MatSetValues function
> can be more flexible when preallocation is not correct.
>
> Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer
> the operation?
> And flush the buffer to real a,i,j array when MatAssemblyEnd is called?
>
> Gong Ding
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110219/168268a5/attachment.htm>

From juhaj at iki.fi  Mon Feb 21 06:10:19 2011
From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=)
Date: Mon, 21 Feb 2011 12:10:19 +0000
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102191019.42133.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
	<4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov>
	<201102191019.42133.juhaj@iki.fi>
Message-ID: <201102211210.23297.juhaj@iki.fi>

> > introduce new variables to reduce the problem to a first order equation.
> > For example let g = f'  and the new problem is F(f,g,g') = 0 with the
> > additional equations g = f' now there are no second derivatives.
> Let me see what happens if I do that...

Ok, so this helps. Now I can get the solution to converge on a small lattice, 
of less than 20 points.

Increasing the lattice gives divergent zig-zag "solutions". Now this is usual 
central differences behaviour: it decouples even lattice points from odd ones 
and now that I have both f and f' as unknowns, this decoupling is total. (It 
was not previously, since f'', computed from f, does not decouple.)

Changing to simple forward differences does not help, but changing to three-
point forward differences (=five-point stencil, but the backwards points are 
not used) fixes the problem and I now get convergence.

That is, thanks for all the help. I can now return to my actual equation, 
which still does not converge with these tricks on any lattice larger than 
about 50 points. I suppose the problem here is similar and I just need to find 
a better discretisation.

Cheers,
Juha

-- 
		 -----------------------------------------------
		| Juha J?ykk?, juhaj at iki.fi			|
		| http://www.maths.leeds.ac.uk/~juhaj		|
		 -----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110221/a1b81b2f/attachment.pgp>

From hung.thanh.nguyen at petrell.no  Mon Feb 21 09:12:14 2011
From: hung.thanh.nguyen at petrell.no (Hung Thanh Nguyen)
Date: Mon, 21 Feb 2011 16:12:14 +0100
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <201102211210.23297.juhaj@iki.fi>
References: <201102161417.09649.juhaj@iki.fi>
	<4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov>
	<201102191019.42133.juhaj@iki.fi> <201102211210.23297.juhaj@iki.fi>
Message-ID: <E11DD12E447ECE4DB66FE86E510E1BA49198152DA3@petsrv.petrell.local>

Hi Pets use
I just install Pets on Windows (I am using C compiler and ITL MKL). And, then running ex2.cpp .... to get error :

Error   2        error: identifier "_intel_fast_memcpy" is undefined    C:\cygwin\home\Hung\petsc-3.1-
p7\include\petscsys.h   1775

Please help me. Best regard Hung T. Nguyen

-----Original Message-----
From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Juha J?ykk?
Sent: 21. februar 2011 13:10
To: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] KSPBuildSolution

> > introduce new variables to reduce the problem to a first order equation.
> > For example let g = f'  and the new problem is F(f,g,g') = 0 with
> > the additional equations g = f' now there are no second derivatives.
> Let me see what happens if I do that...

Ok, so this helps. Now I can get the solution to converge on a small lattice, of less than 20 points.

Increasing the lattice gives divergent zig-zag "solutions". Now this is usual central differences behaviour: it decouples even lattice points from odd ones and now that I have both f and f' as unknowns, this decoupling is total. (It was not previously, since f'', computed from f, does not decouple.)

Changing to simple forward differences does not help, but changing to three- point forward differences (=five-point stencil, but the backwards points are not used) fixes the problem and I now get convergence.

That is, thanks for all the help. I can now return to my actual equation, which still does not converge with these tricks on any lattice larger than about 50 points. I suppose the problem here is similar and I just need to find a better discretisation.

Cheers,
Juha

--
                 -----------------------------------------------
                | Juha J?ykk?, juhaj at iki.fi                     |
                | http://www.maths.leeds.ac.uk/~juhaj           |
                 -----------------------------------------------

From knepley at gmail.com  Mon Feb 21 09:36:16 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 21 Feb 2011 09:36:16 -0600
Subject: [petsc-users] KSPBuildSolution
In-Reply-To: <E11DD12E447ECE4DB66FE86E510E1BA49198152DA3@petsrv.petrell.local>
References: <201102161417.09649.juhaj@iki.fi>
	<4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov>
	<201102191019.42133.juhaj@iki.fi> <201102211210.23297.juhaj@iki.fi>
	<E11DD12E447ECE4DB66FE86E510E1BA49198152DA3@petsrv.petrell.local>
Message-ID: <AANLkTinJ+PLdCYsVV6JMCbgT4z5u2LnXxqJO-VhxeX2A@mail.gmail.com>

Send all the output of 'make test' along with configure.log and make.log to
petsc-maint at mcs.anl.gov

   Matt

On Mon, Feb 21, 2011 at 9:12 AM, Hung Thanh Nguyen <
hung.thanh.nguyen at petrell.no> wrote:

> Hi Pets use
> I just install Pets on Windows (I am using C compiler and ITL MKL). And,
> then running ex2.cpp .... to get error :
>
> Error   2        error: identifier "_intel_fast_memcpy" is undefined
>  C:\cygwin\home\Hung\petsc-3.1-
> p7\include\petscsys.h   1775
>
> Please help me. Best regard Hung T. Nguyen
>
> -----Original Message-----
> From: petsc-users-bounces at mcs.anl.gov [mailto:
> petsc-users-bounces at mcs.anl.gov] On Behalf Of Juha J?ykk?
> Sent: 21. februar 2011 13:10
> To: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] KSPBuildSolution
>
> > > introduce new variables to reduce the problem to a first order
> equation.
> > > For example let g = f'  and the new problem is F(f,g,g') = 0 with
> > > the additional equations g = f' now there are no second derivatives.
> > Let me see what happens if I do that...
>
> Ok, so this helps. Now I can get the solution to converge on a small
> lattice, of less than 20 points.
>
> Increasing the lattice gives divergent zig-zag "solutions". Now this is
> usual central differences behaviour: it decouples even lattice points from
> odd ones and now that I have both f and f' as unknowns, this decoupling is
> total. (It was not previously, since f'', computed from f, does not
> decouple.)
>
> Changing to simple forward differences does not help, but changing to
> three- point forward differences (=five-point stencil, but the backwards
> points are not used) fixes the problem and I now get convergence.
>
> That is, thanks for all the help. I can now return to my actual equation,
> which still does not converge with these tricks on any lattice larger than
> about 50 points. I suppose the problem here is similar and I just need to
> find a better discretisation.
>
> Cheers,
> Juha
>
> --
>                 -----------------------------------------------
>                | Juha J?ykk?, juhaj at iki.fi                     |
>                | http://www.maths.leeds.ac.uk/~juhaj           |
>                 -----------------------------------------------
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110221/9ef69193/attachment.htm>

From clemens.domanig at uibk.ac.at  Tue Feb 22 04:29:16 2011
From: clemens.domanig at uibk.ac.at (Clemens Domanig)
Date: Tue, 22 Feb 2011 11:29:16 +0100
Subject: [petsc-users]  matrix/vector-library
Message-ID: <4D638FFC.6060607@uibk.ac.at>

Hi out there,

just a short question: What matrix/vector-library do you use for doing 
calculations with small matrices/vectors while using Petsc for the large 
problems?
I have to do lots of calculations with small matrices before assembling 
the large system of equations for which I use Petsc. So I want to make 
sure that there is no namespace-trouble, etc.

Thx for your help - Clemens Domanig

From jed at 59A2.org  Tue Feb 22 05:14:15 2011
From: jed at 59A2.org (Jed Brown)
Date: Tue, 22 Feb 2011 12:14:15 +0100
Subject: [petsc-users] matrix/vector-library
In-Reply-To: <4D638FFC.6060607@uibk.ac.at>
References: <4D638FFC.6060607@uibk.ac.at>
Message-ID: <AANLkTinKk2+8aqyNhp-xkCtLpwL1DZQ48HRAtLWYfPCZ@mail.gmail.com>

On Tue, Feb 22, 2011 at 11:29, Clemens Domanig
<clemens.domanig at uibk.ac.at>wrote:

> What matrix/vector-library do you use for doing calculations with small
> matrices/vectors while using Petsc for the large problems?


What language? How "small" is small?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110222/0429921d/attachment.htm>

From knepley at gmail.com  Tue Feb 22 07:49:04 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 22 Feb 2011 07:49:04 -0600
Subject: [petsc-users] matrix/vector-library
In-Reply-To: <4D638FFC.6060607@uibk.ac.at>
References: <4D638FFC.6060607@uibk.ac.at>
Message-ID: <AANLkTindeUFonG8ayYYxWSJXFROMGuLJfLawUw8ssVWn@mail.gmail.com>

BLAS/LAPACK.

  Matt

On Tue, Feb 22, 2011 at 4:29 AM, Clemens Domanig <clemens.domanig at uibk.ac.at
> wrote:

> Hi out there,
>
> just a short question: What matrix/vector-library do you use for doing
> calculations with small matrices/vectors while using Petsc for the large
> problems?
> I have to do lots of calculations with small matrices before assembling the
> large system of equations for which I use Petsc. So I want to make sure that
> there is no namespace-trouble, etc.
>
> Thx for your help - Clemens Domanig
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110222/fe416704/attachment.htm>

From clemens.domanig at uibk.ac.at  Tue Feb 22 09:42:00 2011
From: clemens.domanig at uibk.ac.at (Clemens Domanig)
Date: Tue, 22 Feb 2011 16:42:00 +0100
Subject: [petsc-users] matrix/vector-library
In-Reply-To: <4D638FFC.6060607@uibk.ac.at>
References: <4D638FFC.6060607@uibk.ac.at>
Message-ID: <4D63D948.6080109@uibk.ac.at>

Maybe someone knows a library with commands that are similar to MatLab 
because I have to port hundreds lines of MatLab-code to C++.
Thx - C.

Clemens Domanig wrote:
> Hi out there,
> 
> just a short question: What matrix/vector-library do you use for doing 
> calculations with small matrices/vectors while using Petsc for the large 
> problems?
> I have to do lots of calculations with small matrices before assembling 
> the large system of equations for which I use Petsc. So I want to make 
> sure that there is no namespace-trouble, etc.
> 
> Thx for your help - Clemens Domanig


From u.tabak at tudelft.nl  Tue Feb 22 09:43:47 2011
From: u.tabak at tudelft.nl (Umut Tabak)
Date: Tue, 22 Feb 2011 16:43:47 +0100
Subject: [petsc-users] matrix/vector-library
In-Reply-To: <4D63D948.6080109@uibk.ac.at>
References: <4D638FFC.6060607@uibk.ac.at> <4D63D948.6080109@uibk.ac.at>
Message-ID: <4D63D9B3.5070804@tudelft.nl>

On 02/22/2011 04:42 PM, Clemens Domanig wrote:
> knows a library with commands that are similar to MatLab because I
Not exactly equal but similar, see uBlas from Boost (documentation is a 
bit of the downside though...) or another one which I did not check, 
namely, Eigen,
U.

From jed at 59A2.org  Tue Feb 22 09:49:11 2011
From: jed at 59A2.org (Jed Brown)
Date: Tue, 22 Feb 2011 16:49:11 +0100
Subject: [petsc-users] matrix/vector-library
In-Reply-To: <4D63D948.6080109@uibk.ac.at>
References: <4D638FFC.6060607@uibk.ac.at>
	<4D63D948.6080109@uibk.ac.at>
Message-ID: <AANLkTikVj_MM2-w_LTMZmm8v2zLSXtW5zTzJMVWgMfun@mail.gmail.com>

On Tue, Feb 22, 2011 at 16:42, Clemens Domanig
<clemens.domanig at uibk.ac.at>wrote:

> Maybe someone knows a library with commands that are similar to MatLab
> because I have to port hundreds lines of MatLab-code to C++.


>From C++, you might consider a template library like Eigen (
http://eigen.tuxfamily.org). It overloads the usual arithmetic operators and
performance is very good for small sizes. The downside is longer compilation
time than a classic library and confusing error messages if you have type
errors.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110222/61627570/attachment.htm>

From gaurish108 at gmail.com  Tue Feb 22 11:06:22 2011
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Tue, 22 Feb 2011 12:06:22 -0500
Subject: [petsc-users] Pre-conditioners in PETSc
Message-ID: <AANLkTimtg=jpdqZ9gWpGHmjNTXNJ4a8iniS8v0kephfN@mail.gmail.com>

I am quite confused on using pre-conditioners in PETSc

(1)
When we use KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat,MatStructure
flag), why does the manual page say that Pmat is usually the same as
Amat?
Is Pmat the preconditioning matrix itself, or is Pmat a matrix to
which preconditioning techniques must be applied via "-pc_type
<option_name>" ?

(2)
Also suppose the succeeding statement of KSPSetOperators is
KSPSetFromOptions(ksp_context) and I pass "-pc_type none" at the
terminal, would this mean that Pmat is not at all needed
in PETSc's calculations??

Thank you,

Gaurish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110222/2272f7ed/attachment.htm>

From ecoon at lanl.gov  Tue Feb 22 11:14:35 2011
From: ecoon at lanl.gov (Ethan Coon)
Date: Tue, 22 Feb 2011 10:14:35 -0700
Subject: [petsc-users] Pre-conditioners in PETSc
In-Reply-To: <AANLkTimtg=jpdqZ9gWpGHmjNTXNJ4a8iniS8v0kephfN@mail.gmail.com>
References: <AANLkTimtg=jpdqZ9gWpGHmjNTXNJ4a8iniS8v0kephfN@mail.gmail.com>
Message-ID: <1298394875.2045.3.camel@echo.lanl.gov>

On Tue, 2011-02-22 at 12:06 -0500, Gaurish Telang wrote:
> I am quite confused on using pre-conditioners in PETSc
> 
> (1)
> When we use KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat,MatStructure flag), why does the manual page say that Pmat is usually the same as Amat? 
> 
> Is Pmat the preconditioning matrix itself, or is Pmat a matrix to which preconditioning techniques must be applied via "-pc_type <option_name>" ? 

The 2nd.  Pmat is NOT the approximate inverse of Amat, it is a matrix
whose approximate inverse will be used to multiple Amat before the KSP
is used.
> 
> (2)
> Also suppose the succeeding statement of KSPSetOperators is KSPSetFromOptions(ksp_context) and I pass "-pc_type none" at the terminal, would this mean that Pmat is not at all needed
> 
> in PETSc's calculations?? 

Correct, pmat is not used in that case.  In that case, you still have to
pass something in to KSPSetOperators, so Amat for both is the likely
choice.


Ethan

> 
> Thank you,
> 
> Gaurish
> 
> 
> 
> 
> 

-- 
------------------------------------
Ethan Coon
Post-Doctoral Researcher
Applied Mathematics - T-5
Los Alamos National Laboratory
505-665-8289

http://www.ldeo.columbia.edu/~ecoon/
------------------------------------


From C.Klaij at marin.nl  Wed Feb 23 09:45:39 2011
From: C.Klaij at marin.nl (Klaij, Christiaan)
Date: Wed, 23 Feb 2011 15:45:39 +0000
Subject: [petsc-users] PCDiagonalScale
Message-ID: <A9AE64C525438E45B1D60F4D09AEAA9E01E554@MAR160N1.marin.local>

I'm trying to understand the use of PCDiagonalScale since I want to apply additional diagonal scaling when solving my linear system.

As a first step I modified src/ksp/ksp/examples/tutorials/ex2f.F in petsc-3.1-p7 as follows:

1) at line 87 added 3 lines:
      PC          pc
      PCType      ptype
      PetscScalar tol

2) then I uncommented lines 247 -- 252 (the ones to use PCJACOBI)

3) at line 253 I added :
      call PCDiagonalScale(pc,PETSC_TRUE,ierr)

Running "make ex2f" gives:

ex2f.o: In function `MAIN__':
ex2f.F:(.text+0x767): undefined reference to `pcdiagonalscale_'

Without the call to PCDiagonalScale "make ex2f" does not give any errors and runs fine...


dr. ir. Christiaan Klaij
CFD Researcher
Research & Development
E mailto:C.Klaij at marin.nl
T +31 317 49 33 44

MARIN
2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl


From jed at 59A2.org  Wed Feb 23 09:57:30 2011
From: jed at 59A2.org (Jed Brown)
Date: Wed, 23 Feb 2011 16:57:30 +0100
Subject: [petsc-users] PCDiagonalScale
In-Reply-To: <A9AE64C525438E45B1D60F4D09AEAA9E01E554@MAR160N1.marin.local>
References: <A9AE64C525438E45B1D60F4D09AEAA9E01E554@MAR160N1.marin.local>
Message-ID: <AANLkTikqnYLni7k4fJ8LH9pXnVC1km9T33DrP++RSSwD@mail.gmail.com>

On Wed, Feb 23, 2011 at 16:45, Klaij, Christiaan <C.Klaij at marin.nl> wrote:

> 3) at line 253 I added :
>      call PCDiagonalScale(pc,PETSC_TRUE,ierr)
>

This is not the correct interface. You want PCDiagonalScaleSet(pc,X,ierr).
PCDiagonalScale() is a getter with petsc-3.1 and is not available from
Fortran (but you don't need it). The naming has been made consistent in
petsc-dev.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110223/406fe12e/attachment.htm>

From C.Klaij at marin.nl  Thu Feb 24 01:16:07 2011
From: C.Klaij at marin.nl (Klaij, Christiaan)
Date: Thu, 24 Feb 2011 07:16:07 +0000
Subject: [petsc-users] PCDiagonalScale
Message-ID: <A9AE64C525438E45B1D60F4D09AEAA9E01E57C@MAR160N1.marin.local>

OK. So if I understand correctly (?) in Fortan all I need is:

call KSPGetPC
call PCDiagonalSet
call KSPSolve

I'm using a MatShell and PCShell but I guess that doesn't matter?


> 3) at line 253 I added :
>      call PCDiagonalScale(pc,PETSC_TRUE,ierr)
>

This is not the correct interface. You want PCDiagonalScaleSet(pc,X,ierr).
PCDiagonalScale() is a getter with petsc-3.1 and is not available from
Fortran (but you don't need it). The naming has been made consistent in
petsc-dev.


dr. ir. Christiaan Klaij
CFD Researcher
Research & Development
E mailto:C.Klaij at marin.nl
T +31 317 49 33 44

MARIN
2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl


From jed at 59A2.org  Thu Feb 24 03:50:02 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 24 Feb 2011 10:50:02 +0100
Subject: [petsc-users] PCDiagonalScale
In-Reply-To: <A9AE64C525438E45B1D60F4D09AEAA9E01E57C@MAR160N1.marin.local>
References: <A9AE64C525438E45B1D60F4D09AEAA9E01E57C@MAR160N1.marin.local>
Message-ID: <AANLkTik9x0Au0-AtYM6k7KamcKzjjaGjn8+-HjKWPmiu@mail.gmail.com>

On Thu, Feb 24, 2011 at 08:16, Klaij, Christiaan <C.Klaij at marin.nl> wrote:

> OK. So if I understand correctly (?) in Fortan all I need is:
>
> call KSPGetPC
> call PCDiagonalSet
>

As per my last message, it is spelled PCDiagonalScaleSet in petsc-3.1.


> call KSPSolve
>
> I'm using a MatShell and PCShell but I guess that doesn't matter?
>

That doesn't matter, diagonal scaling occurs at a higher level than the
individual implementation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/16aaeb16/attachment.htm>

From loic.gouarin at math.u-psud.fr  Thu Feb 24 03:56:28 2011
From: loic.gouarin at math.u-psud.fr (gouarin)
Date: Thu, 24 Feb 2011 10:56:28 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <4D5E5F49.9090001@math.u-psud.fr>
References: <4D5E3A68.6020805@math.u-psud.fr>	<AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>	<4D5E4DB4.5010706@math.u-psud.fr>	<AANLkTim17zycS+sE0DnC1VB175beJU1ukXYHPjY0OVsU@mail.gmail.com>
	<4D5E5F49.9090001@math.u-psud.fr>
Message-ID: <4D662B4C.2090307@math.u-psud.fr>

Hi,

I take the GetMatrix code and I re write it for my Stokes problem with 2 
DA grids: one for the velocity and an other for the pression.
The preallocation is now correct but I have now a problem to use fieldsplit.

I set block size to 3 for my matrix but I'm not sure that I can use it 
because I don't have the same number of points for each field. I don't 
hnow how petsc defines the blocks.

How can I use again fieldsplit for the preconditioner ?

Thanks.
Loic

On 18/02/2011 13:00, gouarin wrote:
> On 18/02/2011 12:07, Dave May wrote:
>>
>> How much memory is used when you use
>> -stokes_fieldsplit_0_ksp_max_it 1
>> -stokes_fieldsplit_0_pc_type jacobi
>> ?
>> It's possible that the copy of the diagonal blocks occurring when you
>> invoke Fieldsplit just by itself is using all your available memory. I
>> wouldn't be surprised with a stencil width of 2....
> This is the memory info given by the log_summary for nv=19
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' 
> Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>     Viewer                     1                  0                
> 0         0
>     Index Set                30                 24            96544     0
>     IS L to G Mapping     4                  0                0         0
>     Vec                        46                 17           
> 338344    0
>     Vec Scatter            12                  0                
> 0         0
>     Matrix                    22                 0                
> 0         0
>     Distributed array     2                  0                0         0
>     Preconditioner         3                  0                
> 0         0
>     Krylov Solver         3                    0                
> 0         0
> ======================================================================================================================== 
>
>
> and the malloc_info
>
> ------------------------------------------
> [0] Maximum memory PetscMalloc()ed 184348608 maximum size of entire 
> process 225873920
> [0] Memory usage sorted by function
> [0] 2 3216 ClassPerfLogCreate()
> [0] 2 1616 ClassRegLogCreate()
> [0] 6 9152 DACreate()
> [0] 17 114128 DACreate_3D()
> [0] 3 48 DAGetCoordinateDA()
> [0] 10 265632 DAGetMatrix3d_MPIAIJ()
> [0] 3 48 DASetVertexDivision()
> [0] 2 6416 EventPerfLogCreate()
> [0] 1 12800 EventPerfLogEnsureSize()
> [0] 2 1616 EventRegLogCreate()
> [0] 1 3200 EventRegLogRegister()
> [0] 12 329376 ISAllGather()
> [0] 50 89344 ISCreateBlock()
> [0] 25 354768 ISCreateGeneral()
> [0] 60 7920 ISCreateStride()
> [0] 12 161728 ISGetIndices_Stride()
> [0] 2 21888 ISLocalToGlobalMappingBlock()
> [0] 2 21888 ISLocalToGlobalMappingCreate()
> [0] 12 1728 ISLocalToGlobalMappingCreateNC()
> [0] 9 2544 KSPCreate()
> [0] 1 16 KSPCreate_MINRES()
> [0] 1 16 KSPCreate_Richardson()
> [0] 3 48 KSPDefaultConvergedCreate()
> [0] 66 41888 MatCreate()
> [0] 6 960 MatCreate_MPIAIJ()
> [0] 16 5632 MatCreate_SeqAIJ()
> [0] 4 12000 MatGetRow_MPIAIJ()
> [0] 4 64 MatGetSubMatrices_MPIAIJ()
> [0] 160 941760 MatGetSubMatrices_MPIAIJ_Local()
> [0] 4 121664 MatGetSubMatrix_MPIAIJ_Private()
> [0] 16 304000 MatMarkDiagonal_SeqAIJ()
> [0] 80 181061344 MatSeqAIJSetPreallocation_SeqAIJ()
> [0] 12 113792 MatSetUpMultiply_MPIAIJ()
> [0] 12 288 MatStashCreate_Private()
> [0] 50 864 MatStashScatterBegin_Private()
> [0] 120 108096 MatZeroRows_MPIAIJ()
> [0] 10 182560 Mat_CheckInode()
> [0] 9 1776 PCCreate()
> [0] 1 144 PCCreate_FieldSplit()
> [0] 2 64 PCCreate_Jacobi()
> [0] 4 192 PCFieldSplitSetFields_FieldSplit()
> [0] 1 16 PCSetFromOptions_FieldSplit()
> [0] 5 22864 PCSetUp_FieldSplit()
> [0] 4 64 PetscCommDuplicate()
> [0] 1 4112 PetscDLLibraryOpen()
> [0] 6 24576 PetscDLLibraryRetrieve()
> [0] 45 1712 PetscDLLibrarySym()
> [0] 579 27792 PetscFListAdd()
> [0] 48 2112 PetscGatherMessageLengths()
> [0] 52 832 PetscGatherNumberOfMessages()
> [0] 90 4320 PetscLayoutCreate()
> [0] 64 1392 PetscLayoutSetUp()
> [0] 4 64 PetscLogPrintSummary()
> [0] 12 384 PetscMaxSum()
> [0] 24 6528 PetscOListAdd()
> [0] 28 1792 PetscObjectSetState()
> [0] 8 192 PetscOptionsGetEList()
> [0] 16 4842288 PetscPostIrecvInt()
> [0] 12 4842224 PetscPostIrecvScalar()
> [0] 0 32 PetscPushSignalHandler()
> [0] 1 432 PetscStackCreate()
> [0] 1798 54816 PetscStrallocpy()
> [0] 30 248832 PetscStrreplace()
> [0] 2 45888 PetscTableAdd()
> [0] 24 446528 PetscTableCreate()
> [0] 3 96 PetscTokenCreate()
> [0] 1 16 PetscViewerASCIIMonitorCreate()
> [0] 1 16 PetscViewerASCIIOpen()
> [0] 3 496 PetscViewerCreate()
> [0] 1 64 PetscViewerCreate_ASCII()
> [0] 2 528 StackCreate()
> [0] 2 1008 StageLogCreate()
> [0] 6 14400 User provided function()
> [0] 138 58880 VecCreate()
> [0] 66 1401952 VecCreate_MPI_Private()
> [0] 7 221312 VecCreate_Seq()
> [0] 9 288 VecCreate_Seq_Private()
> [0] 6 160 VecDuplicateVecs_Default()
> [0] 3 3008 VecGetArray3d()
> [0] 42 49536 VecScatterCreate()
> [0] 16 512 VecScatterCreateCommon_PtoS()
> [0] 20 213024 VecScatterCreate_PtoP()
> [0] 252 881536 VecScatterCreate_PtoS()
> [0] 74 1184 VecStashCreate_Private()
>
>


-- 
Loic Gouarin
Laboratoire de Math?matiques
Universit? Paris-Sud
B?timent 425
91405 Orsay Cedex
France
Tel: (+33) 1 69 15 60 14
Fax: (+33) 1 69 15 67 18


From zonexo at gmail.com  Thu Feb 24 04:01:57 2011
From: zonexo at gmail.com (TAY wee-beng)
Date: Thu, 24 Feb 2011 11:01:57 +0100
Subject: [petsc-users] Problem using MatSetOption
Message-ID: <4D662C95.9000700@gmail.com>

Hi,

I'm trying to use the MatSetOption in Fortran:

call MatAssemblyBegin(A_mat_uv,MAT_FINAL_ASSEMBLY,ierr)
call MatAssemblyEnd(A_mat_uv,MAT_FINAL_ASSEMBLY,ierr)

call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)

However I got the error:

Error: This name does not have a type, and must have an explicit type.   
[MAT_NO_NEW_NONZERO_LOCATIONS]
call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)

I tried removing the PETSC_TRUE argument but it also can't work.

Thanks for the help.

-- 
Yours sincerely,

TAY wee-beng


From jed at 59A2.org  Thu Feb 24 04:09:18 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 24 Feb 2011 11:09:18 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <4D662B4C.2090307@math.u-psud.fr>
References: <4D5E3A68.6020805@math.u-psud.fr>
	<AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>
	<4D5E4DB4.5010706@math.u-psud.fr>
	<AANLkTim17zycS+sE0DnC1VB175beJU1ukXYHPjY0OVsU@mail.gmail.com>
	<4D5E5F49.9090001@math.u-psud.fr> <4D662B4C.2090307@math.u-psud.fr>
Message-ID: <AANLkTimypEqJSkMyx-SU3eMZxgpeEQU1E2ptsNuP=5A-@mail.gmail.com>

On Thu, Feb 24, 2011 at 10:56, gouarin <loic.gouarin at math.u-psud.fr> wrote:

> I set block size to 3 for my matrix but I'm not sure that I can use it
> because I don't have the same number of points for each field. I don't hnow
> how petsc defines the blocks.
>
> How can I use again fieldsplit for the preconditioner ?
>

It is worth switching to petsc-dev for this:

1. Use DMComposite to "glue" the velocity and pressure DAs together.

2. Call PCSetDM (or the higher level KSPSetDM, SNESSetDM, or TSSetDM as
appropriate) and pass in the DMComposite.

Now when you -pc_type fieldsplit, it will automatically pick up the fields
from the DMComposite (in the order they were registered). That would give
you two splits in this case.

You can use DMComposite with petsc-3.1, but you have to create index sets
yourself and call PCFieldSplitSetIS().
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/41a8b5fa/attachment.htm>

From jed at 59A2.org  Thu Feb 24 04:19:20 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 24 Feb 2011 11:19:20 +0100
Subject: [petsc-users] Problem using MatSetOption
In-Reply-To: <4D662C95.9000700@gmail.com>
References: <4D662C95.9000700@gmail.com>
Message-ID: <AANLkTikevUn4YTTEG+RxBrxUKC9vJ0TxPm6uRvZj-sXP@mail.gmail.com>

On Thu, Feb 24, 2011 at 11:01, TAY wee-beng <zonexo at gmail.com> wrote:

> Error: This name does not have a type, and must have an explicit type.
> [MAT_NO_NEW_NONZERO_LOCATIONS]
> call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)
>

The last reference to MAT_NO_NEW_NONZERO_LOCATIONS was removed more than two
years ago. You should use

MatSetOption(A_mat_uv,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/3592e16c/attachment.htm>

From C.Klaij at marin.nl  Thu Feb 24 05:14:27 2011
From: C.Klaij at marin.nl (Klaij, Christiaan)
Date: Thu, 24 Feb 2011 11:14:27 +0000
Subject: [petsc-users] PCDiagonalScale
In-Reply-To: <AANLkTik9x0Au0-AtYM6k7KamcKzjjaGjn8+-HjKWPmiu@mail.gmail.com>
References: <A9AE64C525438E45B1D60F4D09AEAA9E01E57C@MAR160N1.marin.local>,
	<AANLkTik9x0Au0-AtYM6k7KamcKzjjaGjn8+-HjKWPmiu@mail.gmail.com>
Message-ID: <A9AE64C525438E45B1D60F4D09AEAA9E01E5A4@MAR160N1.marin.local>

Thanks Jed. Sorry for the typo, obviously I meant to write PCDiagonalScaleSet, not PCDiagonalSet.
I'm using FGMRES but apparently it doesn't support diagonal scaling.

Chris

From: five9a2 at gmail.com [five9a2 at gmail.com] on behalf of Jed Brown [jed at 59A2.org]

Sent: Thursday, February 24, 2011 10:50 AM

On Thu, Feb 24, 2011 at 08:16, Klaij, Christiaan
<C.Klaij at marin.nl> wrote:


OK. So if I understand correctly (?) in Fortan all I need is:


call KSPGetPC

call PCDiagonalSet


As per my last message, it is spelled PCDiagonalScaleSet in petsc-3.1.


call KSPSolve


I'm using a MatShell and PCShell but I guess that doesn't matter?


That doesn't matter, diagonal scaling occurs at a higher level than the individual implementation.


dr. ir. Christiaan Klaij
CFD Researcher
Research & Development
E mailto:C.Klaij at marin.nl
T +31 317 49 33 44

MARIN
2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl


From loic.gouarin at math.u-psud.fr  Thu Feb 24 05:28:28 2011
From: loic.gouarin at math.u-psud.fr (gouarin)
Date: Thu, 24 Feb 2011 12:28:28 +0100
Subject: [petsc-users] Stokes problem with DA and MUMPS
In-Reply-To: <AANLkTimypEqJSkMyx-SU3eMZxgpeEQU1E2ptsNuP=5A-@mail.gmail.com>
References: <4D5E3A68.6020805@math.u-psud.fr>	<AANLkTinbw2yFi=Nhx9-v5TDcqJkdGhr1HR5mqEdZHOKT@mail.gmail.com>	<4D5E4DB4.5010706@math.u-psud.fr>	<AANLkTim17zycS+sE0DnC1VB175beJU1ukXYHPjY0OVsU@mail.gmail.com>	<4D5E5F49.9090001@math.u-psud.fr>	<4D662B4C.2090307@math.u-psud.fr>
	<AANLkTimypEqJSkMyx-SU3eMZxgpeEQU1E2ptsNuP=5A-@mail.gmail.com>
Message-ID: <4D6640DC.1010303@math.u-psud.fr>

I already use petsc-dev and DMComposite.

I call PCSetDM and it works.

Thanks,
Loic

On 24/02/2011 11:09, Jed Brown wrote:
> On Thu, Feb 24, 2011 at 10:56, gouarin <loic.gouarin at math.u-psud.fr 
> <mailto:loic.gouarin at math.u-psud.fr>> wrote:
>
>     I set block size to 3 for my matrix but I'm not sure that I can
>     use it because I don't have the same number of points for each
>     field. I don't hnow how petsc defines the blocks.
>
>     How can I use again fieldsplit for the preconditioner ?
>
>
> It is worth switching to petsc-dev for this:
>
> 1. Use DMComposite to "glue" the velocity and pressure DAs together.
>
> 2. Call PCSetDM (or the higher level KSPSetDM, SNESSetDM, or TSSetDM 
> as appropriate) and pass in the DMComposite.
>
> Now when you -pc_type fieldsplit, it will automatically pick up the 
> fields from the DMComposite (in the order they were registered). That 
> would give you two splits in this case.
>
> You can use DMComposite with petsc-3.1, but you have to create index 
> sets yourself and call PCFieldSplitSetIS().


-- 
Loic Gouarin
Laboratoire de Math?matiques
Universit? Paris-Sud
B?timent 425
91405 Orsay Cedex
France
Tel: (+33) 1 69 15 60 14
Fax: (+33) 1 69 15 67 18

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/40420afc/attachment-0001.htm>

From jed at 59A2.org  Thu Feb 24 05:30:15 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 24 Feb 2011 12:30:15 +0100
Subject: [petsc-users] PCDiagonalScale
In-Reply-To: <A9AE64C525438E45B1D60F4D09AEAA9E01E5A4@MAR160N1.marin.local>
References: <A9AE64C525438E45B1D60F4D09AEAA9E01E57C@MAR160N1.marin.local>
	<AANLkTik9x0Au0-AtYM6k7KamcKzjjaGjn8+-HjKWPmiu@mail.gmail.com>
	<A9AE64C525438E45B1D60F4D09AEAA9E01E5A4@MAR160N1.marin.local>
Message-ID: <AANLkTimjUhOTncrwLCtoEtgZOafy=uQDK18HUpJyb66y@mail.gmail.com>

On Thu, Feb 24, 2011 at 12:14, Klaij, Christiaan <C.Klaij at marin.nl> wrote:
>
> I'm using FGMRES but apparently it doesn't support diagonal scaling.
>

GCR also tolerates a variable preconditioner and it does not have such a
check. I don't know if that means it can use diagonal scaling or just that
someone forgot to check, but you could try it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/036342c8/attachment.htm>

From C.Klaij at marin.nl  Thu Feb 24 07:34:16 2011
From: C.Klaij at marin.nl (Klaij, Christiaan)
Date: Thu, 24 Feb 2011 13:34:16 +0000
Subject: [petsc-users] PCDiagonalScale
In-Reply-To: <AANLkTimjUhOTncrwLCtoEtgZOafy=uQDK18HUpJyb66y@mail.gmail.com>
References: <A9AE64C525438E45B1D60F4D09AEAA9E01E57C@MAR160N1.marin.local>
	<AANLkTik9x0Au0-AtYM6k7KamcKzjjaGjn8+-HjKWPmiu@mail.gmail.com>
	<A9AE64C525438E45B1D60F4D09AEAA9E01E5A4@MAR160N1.marin.local>,
	<AANLkTimjUhOTncrwLCtoEtgZOafy=uQDK18HUpJyb66y@mail.gmail.com>
Message-ID: <A9AE64C525438E45B1D60F4D09AEAA9E01E5BD@MAR160N1.marin.local>

I tried GCR but at first sight the diagonal scaling doesn't seem to do anything.


From: five9a2 at gmail.com [five9a2 at gmail.com] on behalf of Jed Brown [jed at 59A2.org]

Sent: Thursday, February 24, 2011 12:30 PM


On Thu, Feb 24, 2011 at 12:14, Klaij, Christiaan
<C.Klaij at marin.nl> wrote:

I'm using FGMRES but apparently it doesn't support diagonal scaling.


GCR also tolerates a variable preconditioner and it does not have such a check. I don't know if that means it can use diagonal scaling or just that someone forgot to check, but you could try it.


dr. ir. Christiaan Klaij
CFD Researcher
Research & Development
E mailto:C.Klaij at marin.nl
T +31 317 49 33 44

MARIN
2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl


From bsmith at mcs.anl.gov  Thu Feb 24 08:08:42 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 24 Feb 2011 08:08:42 -0600
Subject: [petsc-users] PCDiagonalScale
In-Reply-To: <A9AE64C525438E45B1D60F4D09AEAA9E01E554@MAR160N1.marin.local>
References: <A9AE64C525438E45B1D60F4D09AEAA9E01E554@MAR160N1.marin.local>
Message-ID: <699188CA-750D-446A-AD58-C62CB2A75CE2@mcs.anl.gov>


On Feb 23, 2011, at 9:45 AM, Klaij, Christiaan wrote:

> I'm trying to understand the use of PCDiagonalScale since I want to apply additional diagonal scaling when solving my linear system.

   Do you want to use it exactly for the reason given in 
   PCSetDiagonalScale - Indicates the left scaling to use to apply an additional left and right
      scaling as needed by certain time-stepping codes.

   Logically Collective on PC

   Input Parameters:
+  pc - the preconditioner context
-  s - scaling vector

   Level: intermediate

   Notes: The system solved via the Krylov method is
$           D M A D^{-1} y = D M b  for left preconditioning or
$           D A M D^{-1} z = D b for right preconditioning


> 
> As a first step I modified src/ksp/ksp/examples/tutorials/ex2f.F in petsc-3.1-p7 as follows:
> 
> 1) at line 87 added 3 lines:
>      PC          pc
>      PCType      ptype
>      PetscScalar tol
> 
> 2) then I uncommented lines 247 -- 252 (the ones to use PCJACOBI)
> 
> 3) at line 253 I added :
>      call PCDiagonalScale(pc,PETSC_TRUE,ierr)
> 
> Running "make ex2f" gives:
> 
> ex2f.o: In function `MAIN__':
> ex2f.F:(.text+0x767): undefined reference to `pcdiagonalscale_'
> 
> Without the call to PCDiagonalScale "make ex2f" does not give any errors and runs fine...
> 
> 
> dr. ir. Christiaan Klaij
> CFD Researcher
> Research & Development
> E mailto:C.Klaij at marin.nl
> T +31 317 49 33 44
> 
> MARIN
> 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands
> T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl
> 


From brtnfld at uiuc.edu  Thu Feb 24 09:33:59 2011
From: brtnfld at uiuc.edu (M. Scot Breitenfeld)
Date: Thu, 24 Feb 2011 09:33:59 -0600
Subject: [petsc-users] MatSetValues is expensive
Message-ID: <4D667A67.50700@uiuc.edu>

Hi,

I'm working on a particle type method and I'm using MatSetValues, to
insert values (add_values) into my matrix. Currently I:

do  i, loop over number of particles
     do j, loop over particles in i's family
         ...        
          in row i's dof; insert values in columns of j's (x,y,z) dofs
(3 calls to MatSetValues for i's x,y,z dof)
          in row j's dof; insert values in columns of i's (x,y,z) dofs
(3 calls to MatSetValues for j's x,y,z dof)
         ...
     enddo
enddo
 
Running serially, using MatSetValues it takes 294.8 sec. to assemble the
matrix, if I remove the calls to MatSetValues it takes 29.5 sec. to run
through the same loops, so the MatSetValues calls take up 90% of the
assembling time. I'm preallocating the A matrix specifying d_nnz and o_nnz.

I guess I need to add extra storage so I can call the MatSetValues with
more values so that I can call it less, or just do a lot of
recalculating of values so that I can add an entire row at once. I just
want to make sure this is expected behavior and not something that I'm
doing wrong before I start to rewrite my assembling routine. Probably a
hash table would be better but I don't want to store that and then
convert that to a CRS matrix, I'm already running into memory issues as
it is.

Just out of curiosity, wouldn't a finite element code have a similar
situation, in that case you would form the local stiffness matrix and
then insert that into the global stiffness matrix, so you would be
calling MatSetValues "number of elements" times.


From knepley at gmail.com  Thu Feb 24 09:55:34 2011
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 24 Feb 2011 09:55:34 -0600
Subject: [petsc-users] MatSetValues is expensive
In-Reply-To: <4D667A67.50700@uiuc.edu>
References: <4D667A67.50700@uiuc.edu>
Message-ID: <AANLkTimWBKtu3M7mED0Ra8g0QFkRCriS2V5hf_i3+V-4@mail.gmail.com>

On Thu, Feb 24, 2011 at 9:33 AM, M. Scot Breitenfeld <brtnfld at uiuc.edu>wrote:

> Hi,
>
> I'm working on a particle type method and I'm using MatSetValues, to
> insert values (add_values) into my matrix. Currently I:
>
> do  i, loop over number of particles
>     do j, loop over particles in i's family
>         ...
>          in row i's dof; insert values in columns of j's (x,y,z) dofs
> (3 calls to MatSetValues for i's x,y,z dof)
>

You can set this whole row with a single call.


>          in row j's dof; insert values in columns of i's (x,y,z) dofs
> (3 calls to MatSetValues for j's x,y,z dof)
>         ...
>     enddo
> enddo
>
> Running serially, using MatSetValues it takes 294.8 sec. to assemble the
> matrix, if I remove the calls to MatSetValues it takes 29.5 sec. to run
> through the same loops, so the MatSetValues calls take up 90% of the
> assembling time. I'm preallocating the A matrix specifying d_nnz and o_nnz.
>

Its hard to believe that the preallocation is correct. In order to check,
use

  MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR)

before your MatSetValues() calls.


> I guess I need to add extra storage so I can call the MatSetValues with
> more values so that I can call it less, or just do a lot of
> recalculating of values so that I can add an entire row at once. I just
>

Recalculating?


> want to make sure this is expected behavior and not something that I'm
> doing wrong before I start to rewrite my assembling routine. Probably a
> hash table would be better but I don't want to store that and then
> convert that to a CRS matrix, I'm already running into memory issues as
> it is.
>
> Just out of curiosity, wouldn't a finite element code have a similar
> situation, in that case you would form the local stiffness matrix and
> then insert that into the global stiffness matrix, so you would be
> calling MatSetValues "number of elements" times.
>

No, you call it once per element matrix.

   Matt

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/90dad69a/attachment.htm>

From jed at 59A2.org  Thu Feb 24 10:20:32 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 24 Feb 2011 17:20:32 +0100
Subject: [petsc-users] MatSetValues is expensive
In-Reply-To: <4D667A67.50700@uiuc.edu>
References: <4D667A67.50700@uiuc.edu>
Message-ID: <AANLkTinqdXjAkmietjwZL0e0eQPTs6H9D075kM3vzsa4@mail.gmail.com>

On Thu, Feb 24, 2011 at 16:33, M. Scot Breitenfeld <brtnfld at uiuc.edu> wrote:

> I'm working on a particle type method and I'm using MatSetValues, to
> insert values (add_values) into my matrix. Currently I:
>
> do  i, loop over number of particles
>     do j, loop over particles in i's family
>

How big is a typical family?


>         ...
>          in row i's dof; insert values in columns of j's (x,y,z) dofs
> (3 calls to MatSetValues for i's x,y,z dof)
>

Why make three calls here instead of one?


>          in row j's dof; insert values in columns of i's (x,y,z) dofs
> (3 calls to MatSetValues for j's x,y,z dof)
>

Again, why call these separately? Also, is the matrix symmetric?


>         ...
>     enddo
> enddo
>
> Running serially, using MatSetValues it takes 294.8 sec. to assemble the
> matrix,
>

Are you sure that it was preallocated correctly? Is the cost to compute the
entries essentially zero?


> if I remove the calls to MatSetValues it takes 29.5 sec. to run
> through the same loops, so the MatSetValues calls take up 90% of the
> assembling time. I'm preallocating the A matrix specifying d_nnz and o_nnz.
>
> I guess I need to add extra storage so I can call the MatSetValues with
> more values so that I can call it less, or just do a lot of
> recalculating of values so that I can add an entire row at once. I just
> want to make sure this is expected behavior and not something that I'm
> doing wrong before I start to rewrite my assembling routine. Probably a
> hash table would be better but I don't want to store that and then
> convert that to a CRS matrix, I'm already running into memory issues as
> it is.
>
> Just out of curiosity, wouldn't a finite element code have a similar
> situation, in that case you would form the local stiffness matrix and
> then insert that into the global stiffness matrix, so you would be
> calling MatSetValues "number of elements" times.
>

FEM has a simple quadrature loop that builds a dense element matrix,
MatSetValues() is called once per element.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/fa3d0f36/attachment.htm>

From brtnfld at uiuc.edu  Thu Feb 24 16:49:49 2011
From: brtnfld at uiuc.edu (M. Scot Breitenfeld)
Date: Thu, 24 Feb 2011 16:49:49 -0600
Subject: [petsc-users] MatSetValues is expensive
In-Reply-To: <AANLkTinqdXjAkmietjwZL0e0eQPTs6H9D075kM3vzsa4@mail.gmail.com>
References: <4D667A67.50700@uiuc.edu>
	<AANLkTinqdXjAkmietjwZL0e0eQPTs6H9D075kM3vzsa4@mail.gmail.com>
Message-ID: <4D66E08D.4090609@uiuc.edu>

On 02/24/2011 10:20 AM, Jed Brown wrote:
> On Thu, Feb 24, 2011 at 16:33, M. Scot Breitenfeld <brtnfld at uiuc.edu
> <mailto:brtnfld at uiuc.edu>> wrote:
>
>     I'm working on a particle type method and I'm using MatSetValues, to
>     insert values (add_values) into my matrix. Currently I:
>
>     do  i, loop over number of particles
>         do j, loop over particles in i's family
>
>
> How big is a typical family?
300- 900  particles


>  
>
>             ...
>              in row i's dof; insert values in columns of j's (x,y,z) dofs
>     (3 calls to MatSetValues for i's x,y,z dof)
>
>
> Why make three calls here instead of one?
I split the x-y-z row entries up depending on if the dof is prescribed,
I'll combine them and see if that helps.


>  
>
>              in row j's dof; insert values in columns of i's (x,y,z) dofs
>     (3 calls to MatSetValues for j's x,y,z dof)
>
>
> Again, why call these separately? Also, is the matrix symmetric?
I split them depending on if the dof of particle j is prescribed. The
matrix is symmetric and the percentage of non-zeros in the matrix has a
range of 3%-7% depending on the number of particles in the family.

>  
>
>             ...
>         enddo
>     enddo
>
>     Running serially, using MatSetValues it takes 294.8 sec. to
>     assemble the
>     matrix,
>
>
> Are you sure that it was preallocated correctly? Is the cost to
> compute the entries essentially zero?
Assuming you preallocate by:

  CALL MatCreateMPIAIJ(PETSC_COMM_WORLD, &
       3*mctr, 3*mctr, &
       total_global_nodes*3, total_global_nodes*3, &
       0, d_nnz, 0, o_nnz, A, ierr)

and I tried, as suggested,

  CALL MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE, ierr)

and it does not report any problems.

I would not say it's zero to compute the entries (I guess it takes about
3.5ms per particle for the calculations). This is a fairly small case,
only 8000 particles.
>  
>
>
>
> FEM has a simple quadrature loop that builds a dense element matrix,
> MatSetValues() is called once per element.
That is what I meant, once per element.

I do have a simpler formulation that allows me to enter an entire row
all at once per particle:

do  i, loop over number of particles
    do j, loop over particles in i's family
       ... fill the rows of i's dofs (x,y,z)
    enddo
call MatSetValues...
enddo

For the same case, it takes 6 seconds to assemble. If I remove the
MatSetValues call it takes 0.66 seconds (for this case the calculations
are REALLY simple), the family is also always smaller then the previous
method.


From jed at 59A2.org  Thu Feb 24 16:59:19 2011
From: jed at 59A2.org (Jed Brown)
Date: Thu, 24 Feb 2011 23:59:19 +0100
Subject: [petsc-users] MatSetValues is expensive
In-Reply-To: <4D66E08D.4090609@uiuc.edu>
References: <4D667A67.50700@uiuc.edu>
	<AANLkTinqdXjAkmietjwZL0e0eQPTs6H9D075kM3vzsa4@mail.gmail.com>
	<4D66E08D.4090609@uiuc.edu>
Message-ID: <AANLkTi=aZ6GuSgcSn5h1k6HzLsbk0+=01nCjz7gymM2G@mail.gmail.com>

On Thu, Feb 24, 2011 at 23:49, M. Scot Breitenfeld <brtnfld at uiuc.edu> wrote:

> I would not say it's zero to compute the entries (I guess it takes about
> 3.5ms per particle for the calculations). This is a fairly small case,
> only 8000 particles.
>

With 300 to 900 interactions per particle, times 3 for each component, times
two for lower and upper triangular piece. So we're looking at half a
microsecond per insertion. That still seems like a lot, but perhaps the
access pattern is very irregular because the particles have an essentially
random ordering. Did you build --with-debugging=0? That should make a
reasonable difference.

Also, since the matrix is symmetric, you might consider using the SBAIJ
matrix format. That will cut your storage costs almost in half and should
speed up insertion because all interactions for a given particle will be in
the same block-row, thus nearby in memory.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/0075e7a1/attachment.htm>

From mgrabbani at gmail.com  Thu Feb 24 20:43:21 2011
From: mgrabbani at gmail.com (Golam Rabbani)
Date: Thu, 24 Feb 2011 18:43:21 -0800
Subject: [petsc-users] About MatLUFactor()
Message-ID: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>

Hi,

I need to use MatLUFactor(), but do not know what to pass for the last 3
arguments. Would you please explain a bit; I did not find any example code
for this one.

PetscErrorCode  MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo *info)

My matrix is a dense one and the result from this call will be used in
MatMatSolve().

Regards,
Golam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/a8f54b67/attachment.htm>

From bsmith at mcs.anl.gov  Thu Feb 24 21:26:43 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 24 Feb 2011 21:26:43 -0600
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
Message-ID: <4DB44D2D-CA7D-4D9B-94B9-98511778D496@mcs.anl.gov>


  For MATSEQDENSE the factorization is just a thin wrapper to LAPACK, the row, col and info are ignored. You can pass in 0 for all three arguments

   Barry

On Feb 24, 2011, at 8:43 PM, Golam Rabbani wrote:

> Hi,
> 
> I need to use MatLUFactor(), but do not know what to pass for the last 3 arguments. Would you please explain a bit; I did not find any example code for this one.
> 
> PetscErrorCode  MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo *info)
> My matrix is a dense one and the result from this call will be used in MatMatSolve().
> 
> Regards,
> Golam


From hzhang at mcs.anl.gov  Thu Feb 24 21:31:29 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Thu, 24 Feb 2011 21:31:29 -0600
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
Message-ID: <AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>

For sequential dense in-place LU factorization, see
/src/mat/examples/tests/ex1.c.
For parallel, you need install PLAPACK.

It seems we do not have MatMatSolve() for dense matrix format. I can
add it if you need it.

Hong

On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com> wrote:
> Hi,
>
> I need to use MatLUFactor(), but do not know what to pass for the last 3
> arguments. Would you please explain a bit; I did not find any example code
> for this one.
>
> PetscErrorCode  MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo *info)
>
> My matrix is a dense one and the result from this call will be used in
> MatMatSolve().
>
> Regards,
> Golam
>

From mgrabbani at gmail.com  Fri Feb 25 01:56:30 2011
From: mgrabbani at gmail.com (Golam Rabbani)
Date: Thu, 24 Feb 2011 23:56:30 -0800
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
	<AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
Message-ID: <AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>

Oh, It thought MatMatSolve() was there as you mention something about it in
the FAQ. Please add it then.

Thanks,
Golam

On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> For sequential dense in-place LU factorization, see
> /src/mat/examples/tests/ex1.c.
> For parallel, you need install PLAPACK.
>
> It seems we do not have MatMatSolve() for dense matrix format. I can
> add it if you need it.
>
> Hong
>
> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com>
> wrote:
> > Hi,
> >
> > I need to use MatLUFactor(), but do not know what to pass for the last 3
> > arguments. Would you please explain a bit; I did not find any example
> code
> > for this one.
> >
> > PetscErrorCode  MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo
> *info)
> >
> > My matrix is a dense one and the result from this call will be used in
> > MatMatSolve().
> >
> > Regards,
> > Golam
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110224/89b926ee/attachment.htm>

From mirzadeh at gmail.com  Fri Feb 25 03:45:43 2011
From: mirzadeh at gmail.com (Mohammad Mirzadeh)
Date: Fri, 25 Feb 2011 01:45:43 -0800
Subject: [petsc-users] undefined reference error in make test
Message-ID: <AANLkTiko3WwSPDozw4rEaw=CZ-VszzzUEuNfsRoyx-Sw@mail.gmail.com>

Hi all,

I just noticed that when compiling petsc-3.1-p7 with hypre-2.0.0, running
make test results in the following undefined reference error on ex19 and
ex5f:

--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html
/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpicc -o ex19.o -c
-Wall -Wwrite-strings -Wno-strict-aliasing -g3
-I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include
-I/home/m.mirzadeh/soft/petsc-3.1-p7/include
-I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include
-D__INSDIR__=src/snes/examples/tutorials/ ex19.c
/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -g3  -o ex19  ex19.o
-Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib
-L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lpetsc  -lX11
-Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib
-L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lHYPRE
-lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -lrt
-L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpich -lpthread -lrt -lgcc_s
-lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl
-lmpich -lpthread -lrt -lgcc_s -ldl
/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib/libpetsc.a(mhyp.o):
In function `MatZeroEntries_HYPREStruct_3d':
/home/m.mirzadeh/soft/petsc-3.1-p7/src/dm/da/utils/mhyp.c:397: undefined
reference to `hypre_StructMatrixClearBoxValues'
collect2: ld returned 1 exit status
make[3]: [ex19] Error 1 (ignored)
/bin/rm -f ex19.o

--------------Error detected during compile or link!-----------------------
See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html
/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpif90 -c  -Wall
-Wno-unused-variable -g
 -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include
-I/home/m.mirzadeh/soft/petsc-3.1-p7/include
-I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include
-I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include
-I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include    -o ex5f.o
ex5f.F
/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpif90 -Wall
-Wno-unused-variable -g  -o ex5f ex5f.o
-Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib
-L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lpetsc  -lX11
-Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib
-L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lHYPRE
-lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -lrt
-L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpich -lpthread -lrt -lgcc_s
-lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl
-lmpich -lpthread -lrt -lgcc_s -ldl
/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib/libpetsc.a(mhyp.o):
In function `MatZeroEntries_HYPREStruct_3d':
/home/m.mirzadeh/soft/petsc-3.1-p7/src/dm/da/utils/mhyp.c:397: undefined
reference to `hypre_StructMatrixClearBoxValues'
collect2: ld returned 1 exit status
make[3]: [ex5f] Error 1 (ignored)
/bin/rm -f ex5f.o
Completed test examples

It seems that the function "hypre_StructMatrixClearBoxValues()" is not
properly defined. This problem is new as I didn't have any trouble with
petsc-3.0.0-p12.

Mohammad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/5cd3fe1f/attachment.htm>

From jed at 59A2.org  Fri Feb 25 03:58:23 2011
From: jed at 59A2.org (Jed Brown)
Date: Fri, 25 Feb 2011 10:58:23 +0100
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <AANLkTiko3WwSPDozw4rEaw=CZ-VszzzUEuNfsRoyx-Sw@mail.gmail.com>
References: <AANLkTiko3WwSPDozw4rEaw=CZ-VszzzUEuNfsRoyx-Sw@mail.gmail.com>
Message-ID: <AANLkTi=mYo17i5oorqqXurRJodQ-NW8ezwtYTfDDscyB@mail.gmail.com>

On Fri, Feb 25, 2011 at 10:45, Mohammad Mirzadeh <mirzadeh at gmail.com> wrote:

> I just noticed that when compiling petsc-3.1-p7 with hypre-2.0.0


It seems that the Hypre team has no plans to do "general releases" so
everyone uses "beta" releases instead. I see that hypre-2.7.0b has been
released and petsc-3.1 might work with it, but that has not been tested.
Note that --download-hypre will build a current version (hypre-2.6.0b) for
you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/5f8c6848/attachment-0001.htm>

From mirzadeh at gmail.com  Fri Feb 25 05:18:03 2011
From: mirzadeh at gmail.com (Mohammad Mirzadeh)
Date: Fri, 25 Feb 2011 03:18:03 -0800
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <AANLkTi=mYo17i5oorqqXurRJodQ-NW8ezwtYTfDDscyB@mail.gmail.com>
References: <AANLkTiko3WwSPDozw4rEaw=CZ-VszzzUEuNfsRoyx-Sw@mail.gmail.com>
	<AANLkTi=mYo17i5oorqqXurRJodQ-NW8ezwtYTfDDscyB@mail.gmail.com>
Message-ID: <AANLkTinfNzGFy9QMC+a18mDL6cfPy7c8hhEyuMtnVrkv@mail.gmail.com>

Just tried hypre-2.7.0b. That didn't solve the problem either. However, I
just found this in my make.log file that might help:

mhyp.c: In function 'MatZeroEntries_HYPREStruct_3d':
mhyp.c:397: warning: implicit declaration of function
'hypre_StructMatrixClearBoxValues'

On Fri, Feb 25, 2011 at 1:58 AM, Jed Brown <jed at 59a2.org> wrote:

> On Fri, Feb 25, 2011 at 10:45, Mohammad Mirzadeh <mirzadeh at gmail.com>wrote:
>
>> I just noticed that when compiling petsc-3.1-p7 with hypre-2.0.0
>
>
> It seems that the Hypre team has no plans to do "general releases" so
> everyone uses "beta" releases instead. I see that hypre-2.7.0b has been
> released and petsc-3.1 might work with it, but that has not been tested.
> Note that --download-hypre will build a current version (hypre-2.6.0b) for
> you.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/5b29882b/attachment.htm>

From jed at 59a2.org  Fri Feb 25 06:25:56 2011
From: jed at 59a2.org (Jed Brown)
Date: Fri, 25 Feb 2011 13:25:56 +0100
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <AANLkTinfNzGFy9QMC+a18mDL6cfPy7c8hhEyuMtnVrkv@mail.gmail.com>
References: <AANLkTiko3WwSPDozw4rEaw=CZ-VszzzUEuNfsRoyx-Sw@mail.gmail.com>
	<AANLkTi=mYo17i5oorqqXurRJodQ-NW8ezwtYTfDDscyB@mail.gmail.com>
	<AANLkTinfNzGFy9QMC+a18mDL6cfPy7c8hhEyuMtnVrkv@mail.gmail.com>
Message-ID: <AANLkTi=jAXOSBVnsAeeHB7yFUrRJ3_dEJ3hK2Sfm85uq@mail.gmail.com>

On Fri, Feb 25, 2011 at 12:18, Mohammad Mirzadeh <mirzadeh at gmail.com> wrote:

> Just tried hypre-2.7.0b. That didn't solve the problem either. However, I
> just found this in my make.log file that might help:
>
> mhyp.c: In function 'MatZeroEntries_HYPREStruct_3d':
> mhyp.c:397: warning: implicit declaration of function
> 'hypre_StructMatrixClearBoxValues'
>

I just built PETSc with hypre-2.7.0b and it works for me. Are you sure that
you are using the new hypre-2.7.0b instead of stale files from hypre-2.0.0?
You can check that this function is declared in "_hypre_struct_mv.h", and
also check that it is present in the library using

$ nm ompi/lib/libHYPRE.a |grep hypre_StructMatrixClearBoxValues
                 U hypre_StructMatrixClearBoxValues
0000000000004543 T hypre_StructMatrixClearBoxValues

The second line shows that the symbol is defined in that archive.

Background: they intended this to be a private function, but did not provide
a public interface to zero the entries, therefore PETSc calls this private
function which is declared in

#include "_hypre_struct_mv.h"

included via "mhyp.h" from "mhyp.c".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/ea12bb2a/attachment.htm>

From hung.thanh.nguyen at petrell.no  Fri Feb 25 06:24:32 2011
From: hung.thanh.nguyen at petrell.no (Hung Thanh Nguyen)
Date: Fri, 25 Feb 2011 13:24:32 +0100
Subject: [petsc-users] problem to build PETCS with
	Window-Intel_mkl_blas_lapack_mpi
Message-ID: <E11DD12E447ECE4DB66FE86E510E1BA49198152DA5@petsrv.petrell.local>

Hi all
I am new PETSC using. I am try to install PETSc on Window-Intel-MKL. I am not sure how to link PETSc with intel-mkl-blac-lapack-mpi ? I try :

$./config/configure.py -with-vendor-compilers=intel \ --with-blac-lapack-dir=/opt/intel/mkl/11.1/067/ia32/lib \ --with-mpi-dir=/opt/intel/mkl//11.1/067/ia32/lib

And I got the error-message : UNABLE to CONFIGURE with GIVEN OPTIONS

You must specify a path for MPI with -mpi-dir=<directory>
If you do not want MPI, then given -with-mpi=0
....

Best Regards
Hung T. Nguyen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/3baf4a6d/attachment.htm>

From bsmith at mcs.anl.gov  Fri Feb 25 08:24:07 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 25 Feb 2011 08:24:07 -0600
Subject: [petsc-users] problem to build PETCS with
	Window-Intel_mkl_blas_lapack_mpi
In-Reply-To: <E11DD12E447ECE4DB66FE86E510E1BA49198152DA5@petsrv.petrell.local>
References: <E11DD12E447ECE4DB66FE86E510E1BA49198152DA5@petsrv.petrell.local>
Message-ID: <742F6FBB-2635-41DA-A16A-10EBF05F6AA8@mcs.anl.gov>


> --with-blac-lapack-dir 

should be --with-blas-lapack-dir

If it fails after rerunning then please send the resulting file configure.log to petsc-maint at mcs.anl.gov

   Barry


On Feb 25, 2011, at 6:24 AM, Hung Thanh Nguyen wrote:

> Hi all
> I am new PETSC using. I am try to install PETSc on Window-Intel-MKL. I am not sure how to link PETSc with intel-mkl-blac-lapack-mpi ? I try :
>  
> $./config/configure.py ?with-vendor-compilers=intel \ --with-blac-lapack-dir=/opt/intel/mkl/11.1/067/ia32/lib \ --with-mpi-dir=/opt/intel/mkl//11.1/067/ia32/lib
>  
> And I got the error-message : UNABLE to CONFIGURE with GIVEN OPTIONS
>  
> You must specify a path for MPI with ?mpi-dir=<directory>
> If you do not want MPI, then given ?with-mpi=0
> ?.
>  
> Best Regards
> Hung T. Nguyen
>  


From brtnfld at uiuc.edu  Fri Feb 25 10:31:35 2011
From: brtnfld at uiuc.edu (M. Scot Breitenfeld)
Date: Fri, 25 Feb 2011 10:31:35 -0600
Subject: [petsc-users] MatSetValues is expensive
In-Reply-To: <AANLkTi=aZ6GuSgcSn5h1k6HzLsbk0+=01nCjz7gymM2G@mail.gmail.com>
References: <4D667A67.50700@uiuc.edu>	<AANLkTinqdXjAkmietjwZL0e0eQPTs6H9D075kM3vzsa4@mail.gmail.com>	<4D66E08D.4090609@uiuc.edu>
	<AANLkTi=aZ6GuSgcSn5h1k6HzLsbk0+=01nCjz7gymM2G@mail.gmail.com>
Message-ID: <4D67D967.8020109@uiuc.edu>

On 02/24/2011 04:59 PM, Jed Brown wrote:
> On Thu, Feb 24, 2011 at 23:49, M. Scot Breitenfeld <brtnfld at uiuc.edu
> <mailto:brtnfld at uiuc.edu>> wrote:
>
>     I would not say it's zero to compute the entries (I guess it takes
>     about
>     3.5ms per particle for the calculations). This is a fairly small case,
>     only 8000 particles.
>
>
> With 300 to 900 interactions per particle, times 3 for each component,
> times two for lower and upper triangular piece. So we're looking at
> half a microsecond per insertion. That still seems like a lot, but
> perhaps the access pattern is very irregular because the particles
> have an essentially random ordering. Did you build --with-debugging=0?
> That should make a reasonable difference.
>
I have not because on the machine I'm running on the compilation fails
with error: cast to type "__m64" is not allowed, I've reported it back
in September (I'm going to upgrade my compiler and OS soon, so hopefully
that will fix the problem). I'll recompile it on another machine and see
if that helps.


> Also, since the matrix is symmetric, you might consider using the
> SBAIJ matrix format. That will cut your storage costs almost in half
> and should speed up insertion because all interactions for a given
> particle will be in the same block-row, thus nearby in memory.

That would be great! But I don't see in the manual a function for
creating a parallel SBAIJ matrix, only  a sequential SBAIJ.


From jed at 59A2.org  Fri Feb 25 10:36:40 2011
From: jed at 59A2.org (Jed Brown)
Date: Fri, 25 Feb 2011 17:36:40 +0100
Subject: [petsc-users] MatSetValues is expensive
In-Reply-To: <4D67D967.8020109@uiuc.edu>
References: <4D667A67.50700@uiuc.edu>
	<AANLkTinqdXjAkmietjwZL0e0eQPTs6H9D075kM3vzsa4@mail.gmail.com>
	<4D66E08D.4090609@uiuc.edu>
	<AANLkTi=aZ6GuSgcSn5h1k6HzLsbk0+=01nCjz7gymM2G@mail.gmail.com>
	<4D67D967.8020109@uiuc.edu>
Message-ID: <AANLkTikdCLb68fXOuBTU=jYdXZoncUW2tMOjo_3PXpq8@mail.gmail.com>

On Fri, Feb 25, 2011 at 17:31, M. Scot Breitenfeld <brtnfld at uiuc.edu> wrote:

> I have not because on the machine I'm running on the compilation fails
> with error: cast to type "__m64" is not allowed, I've reported it back
> in September (I'm going to upgrade my compiler and OS soon, so hopefully
> that will fix the problem).
>

Sounds like a compiler/libraries problem.

That would be great! But I don't see in the manual a function for
> creating a parallel SBAIJ matrix, only  a sequential SBAIJ.
>

See MatCreateMPISBAIJ()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/1be8e33d/attachment.htm>

From hzhang at mcs.anl.gov  Fri Feb 25 11:25:33 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Fri, 25 Feb 2011 11:25:33 -0600
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
	<AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
	<AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>
Message-ID: <AANLkTinW3V-dimHmbB5-Z2dRbkX+B+oW9uawU8K6ut=i@mail.gmail.com>

 Golam:
> Oh, It thought MatMatSolve() was there as you mention something about it in
> the FAQ. Please add it then.

Checking carefully, I realize that petsc supports MatMatSolve() for
all matrix types
by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector,
except  few matrix types (seqdense is not included) that we provided
more efficient implementation.

A simplified ex1.c  with MatMatSolve() for petsc-dev is attached
(note: this only works with petsc-dev)
for your info.

Hong


>
> Thanks,
> Golam
>
> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>
>> For sequential dense in-place LU factorization, see
>> /src/mat/examples/tests/ex1.c.
>> For parallel, you need install PLAPACK.
>>
>> It seems we do not have MatMatSolve() for dense matrix format. I can
>> add it if you need it.
>>
>> Hong
>>
>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I need to use MatLUFactor(), but do not know what to pass for the last 3
>> > arguments. Would you please explain a bit; I did not find any example
>> > code
>> > for this one.
>> >
>> > PetscErrorCode ?MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo
>> > *info)
>> >
>> > My matrix is a dense one and the result from this call will be used in
>> > MatMatSolve().
>> >
>> > Regards,
>> > Golam
>> >
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex1.c
Type: text/x-csrc
Size: 6160 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/9d9f0879/attachment-0001.c>

From bsmith at mcs.anl.gov  Fri Feb 25 11:36:14 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 25 Feb 2011 11:36:14 -0600
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <AANLkTinW3V-dimHmbB5-Z2dRbkX+B+oW9uawU8K6ut=i@mail.gmail.com>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
	<AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
	<AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>
	<AANLkTinW3V-dimHmbB5-Z2dRbkX+B+oW9uawU8K6ut=i@mail.gmail.com>
Message-ID: <F9B037D4-9A2E-40CB-81B6-33C388D46DE4@mcs.anl.gov>


   A custom MatMatSolve_SeqDense() will work much better than the one that does a single solve at a time because it will use Blas 3 unstead of 2.

   Barry

On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote:

> Golam:
>> Oh, It thought MatMatSolve() was there as you mention something about it in
>> the FAQ. Please add it then.
> 
> Checking carefully, I realize that petsc supports MatMatSolve() for
> all matrix types
> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector,
> except  few matrix types (seqdense is not included) that we provided
> more efficient implementation.
> 
> A simplified ex1.c  with MatMatSolve() for petsc-dev is attached
> (note: this only works with petsc-dev)
> for your info.
> 
> Hong
> 
> 
> 
>> 
>> Thanks,
>> Golam
>> 
>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>> 
>>> For sequential dense in-place LU factorization, see
>>> /src/mat/examples/tests/ex1.c.
>>> For parallel, you need install PLAPACK.
>>> 
>>> It seems we do not have MatMatSolve() for dense matrix format. I can
>>> add it if you need it.
>>> 
>>> Hong
>>> 
>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com>
>>> wrote:
>>>> Hi,
>>>> 
>>>> I need to use MatLUFactor(), but do not know what to pass for the last 3
>>>> arguments. Would you please explain a bit; I did not find any example
>>>> code
>>>> for this one.
>>>> 
>>>> PetscErrorCode  MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo
>>>> *info)
>>>> 
>>>> My matrix is a dense one and the result from this call will be used in
>>>> MatMatSolve().
>>>> 
>>>> Regards,
>>>> Golam
>>>> 
>> 
>> 
> <ex1.c>


From hzhang at mcs.anl.gov  Fri Feb 25 11:49:02 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Fri, 25 Feb 2011 11:49:02 -0600
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <F9B037D4-9A2E-40CB-81B6-33C388D46DE4@mcs.anl.gov>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
	<AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
	<AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>
	<AANLkTinW3V-dimHmbB5-Z2dRbkX+B+oW9uawU8K6ut=i@mail.gmail.com>
	<F9B037D4-9A2E-40CB-81B6-33C388D46DE4@mcs.anl.gov>
Message-ID: <AANLkTiny570=OPSEZz7fyp3Dtnj3VgBBkyV8Qiatc93Q@mail.gmail.com>

I'll add it. Hong

On Fri, Feb 25, 2011 at 11:36 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ? A custom MatMatSolve_SeqDense() will work much better than the one that does a single solve at a time because it will use Blas 3 unstead of 2.
>
> ? Barry
>
> On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote:
>
>> Golam:
>>> Oh, It thought MatMatSolve() was there as you mention something about it in
>>> the FAQ. Please add it then.
>>
>> Checking carefully, I realize that petsc supports MatMatSolve() for
>> all matrix types
>> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector,
>> except ?few matrix types (seqdense is not included) that we provided
>> more efficient implementation.
>>
>> A simplified ex1.c ?with MatMatSolve() for petsc-dev is attached
>> (note: this only works with petsc-dev)
>> for your info.
>>
>> Hong
>>
>>
>>
>>>
>>> Thanks,
>>> Golam
>>>
>>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>>>
>>>> For sequential dense in-place LU factorization, see
>>>> /src/mat/examples/tests/ex1.c.
>>>> For parallel, you need install PLAPACK.
>>>>
>>>> It seems we do not have MatMatSolve() for dense matrix format. I can
>>>> add it if you need it.
>>>>
>>>> Hong
>>>>
>>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I need to use MatLUFactor(), but do not know what to pass for the last 3
>>>>> arguments. Would you please explain a bit; I did not find any example
>>>>> code
>>>>> for this one.
>>>>>
>>>>> PetscErrorCode ?MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo
>>>>> *info)
>>>>>
>>>>> My matrix is a dense one and the result from this call will be used in
>>>>> MatMatSolve().
>>>>>
>>>>> Regards,
>>>>> Golam
>>>>>
>>>
>>>
>> <ex1.c>
>
>

From mirzadeh at gmail.com  Fri Feb 25 17:03:18 2011
From: mirzadeh at gmail.com (Mohammad Mirzadeh)
Date: Fri, 25 Feb 2011 15:03:18 -0800
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <AANLkTi=jAXOSBVnsAeeHB7yFUrRJ3_dEJ3hK2Sfm85uq@mail.gmail.com>
References: <AANLkTiko3WwSPDozw4rEaw=CZ-VszzzUEuNfsRoyx-Sw@mail.gmail.com>
	<AANLkTi=mYo17i5oorqqXurRJodQ-NW8ezwtYTfDDscyB@mail.gmail.com>
	<AANLkTinfNzGFy9QMC+a18mDL6cfPy7c8hhEyuMtnVrkv@mail.gmail.com>
	<AANLkTi=jAXOSBVnsAeeHB7yFUrRJ3_dEJ3hK2Sfm85uq@mail.gmail.com>
Message-ID: <AANLkTinoOh=rxkjJOa2eej5-7_Uzt08YSpe=J+rCDJ-r@mail.gmail.com>

Thanks Jed. I think the problem was petsc using some of files from hypre 2.0
although I was compiling it with 2.7. I did a fresh install with hypre 2.7
and now everything is running fine.

Thanks again,
Mohammad

On Fri, Feb 25, 2011 at 4:25 AM, Jed Brown <jed at 59a2.org> wrote:

> On Fri, Feb 25, 2011 at 12:18, Mohammad Mirzadeh <mirzadeh at gmail.com>wrote:
>
>> Just tried hypre-2.7.0b. That didn't solve the problem either. However, I
>> just found this in my make.log file that might help:
>>
>> mhyp.c: In function 'MatZeroEntries_HYPREStruct_3d':
>> mhyp.c:397: warning: implicit declaration of function
>> 'hypre_StructMatrixClearBoxValues'
>>
>
> I just built PETSc with hypre-2.7.0b and it works for me. Are you sure that
> you are using the new hypre-2.7.0b instead of stale files from hypre-2.0.0?
> You can check that this function is declared in "_hypre_struct_mv.h", and
> also check that it is present in the library using
>
> $ nm ompi/lib/libHYPRE.a |grep hypre_StructMatrixClearBoxValues
>                  U hypre_StructMatrixClearBoxValues
> 0000000000004543 T hypre_StructMatrixClearBoxValues
>
> The second line shows that the symbol is defined in that archive.
>
> Background: they intended this to be a private function, but did not
> provide a public interface to zero the entries, therefore PETSc calls this
> private function which is declared in
>
> #include "_hypre_struct_mv.h"
>
> included via "mhyp.h" from "mhyp.c".
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/a1bc7c11/attachment.htm>

From hzhang at mcs.anl.gov  Fri Feb 25 18:43:10 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Fri, 25 Feb 2011 18:43:10 -0600
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <F9B037D4-9A2E-40CB-81B6-33C388D46DE4@mcs.anl.gov>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
	<AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
	<AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>
	<AANLkTinW3V-dimHmbB5-Z2dRbkX+B+oW9uawU8K6ut=i@mail.gmail.com>
	<F9B037D4-9A2E-40CB-81B6-33C388D46DE4@mcs.anl.gov>
Message-ID: <AANLkTi=171NeyNZhZJLfhxB3cz1ZcuFeQ0das0VvSVM0@mail.gmail.com>

Customized MatMatSolve_SeqDense()  is added to petsc-dev.

Hong

On Fri, Feb 25, 2011 at 11:36 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> ? A custom MatMatSolve_SeqDense() will work much better than the one that does a single solve at a time because it will use Blas 3 unstead of 2.
>
> ? Barry
>
> On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote:
>
>> Golam:
>>> Oh, It thought MatMatSolve() was there as you mention something about it in
>>> the FAQ. Please add it then.
>>
>> Checking carefully, I realize that petsc supports MatMatSolve() for
>> all matrix types
>> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector,
>> except ?few matrix types (seqdense is not included) that we provided
>> more efficient implementation.
>>
>> A simplified ex1.c ?with MatMatSolve() for petsc-dev is attached
>> (note: this only works with petsc-dev)
>> for your info.
>>
>> Hong
>>
>>
>>
>>>
>>> Thanks,
>>> Golam
>>>
>>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>>>
>>>> For sequential dense in-place LU factorization, see
>>>> /src/mat/examples/tests/ex1.c.
>>>> For parallel, you need install PLAPACK.
>>>>
>>>> It seems we do not have MatMatSolve() for dense matrix format. I can
>>>> add it if you need it.
>>>>
>>>> Hong
>>>>
>>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I need to use MatLUFactor(), but do not know what to pass for the last 3
>>>>> arguments. Would you please explain a bit; I did not find any example
>>>>> code
>>>>> for this one.
>>>>>
>>>>> PetscErrorCode ?MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo
>>>>> *info)
>>>>>
>>>>> My matrix is a dense one and the result from this call will be used in
>>>>> MatMatSolve().
>>>>>
>>>>> Regards,
>>>>> Golam
>>>>>
>>>
>>>
>> <ex1.c>
>
>

From mgrabbani at gmail.com  Fri Feb 25 19:19:25 2011
From: mgrabbani at gmail.com (Golam Rabbani)
Date: Fri, 25 Feb 2011 17:19:25 -0800
Subject: [petsc-users] About MatLUFactor()
In-Reply-To: <AANLkTi=171NeyNZhZJLfhxB3cz1ZcuFeQ0das0VvSVM0@mail.gmail.com>
References: <AANLkTikXbLw+7wNRSK+Gfzj4mHyKoVk+H=J+A_=O+bYo@mail.gmail.com>
	<AANLkTinv+uBktfA7WKWpkjDJdpkN34EN-VCPAsrim-Km@mail.gmail.com>
	<AANLkTi=3aZqfiE6Bs=Szc_O+7j9oryC5KKn2YFDXeEv2@mail.gmail.com>
	<AANLkTinW3V-dimHmbB5-Z2dRbkX+B+oW9uawU8K6ut=i@mail.gmail.com>
	<F9B037D4-9A2E-40CB-81B6-33C388D46DE4@mcs.anl.gov>
	<AANLkTi=171NeyNZhZJLfhxB3cz1ZcuFeQ0das0VvSVM0@mail.gmail.com>
Message-ID: <AANLkTimAWx6anhtA+KUYosEPesqUqgbohq4e+Lif16TV@mail.gmail.com>

Thanks. Although I have never used petsc-dev but the standard version seemed
easy to me( I am new to linux as well)... will let you know once i have
tried it out.

Golam

On Fri, Feb 25, 2011 at 4:43 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> Customized MatMatSolve_SeqDense()  is added to petsc-dev.
>
> Hong
>
> On Fri, Feb 25, 2011 at 11:36 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   A custom MatMatSolve_SeqDense() will work much better than the one that
> does a single solve at a time because it will use Blas 3 unstead of 2.
> >
> >   Barry
> >
> > On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote:
> >
> >> Golam:
> >>> Oh, It thought MatMatSolve() was there as you mention something about
> it in
> >>> the FAQ. Please add it then.
> >>
> >> Checking carefully, I realize that petsc supports MatMatSolve() for
> >> all matrix types
> >> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs
> vector,
> >> except  few matrix types (seqdense is not included) that we provided
> >> more efficient implementation.
> >>
> >> A simplified ex1.c  with MatMatSolve() for petsc-dev is attached
> >> (note: this only works with petsc-dev)
> >> for your info.
> >>
> >> Hong
> >>
> >>
> >>
> >>>
> >>> Thanks,
> >>> Golam
> >>>
> >>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang <hzhang at mcs.anl.gov>
> wrote:
> >>>>
> >>>> For sequential dense in-place LU factorization, see
> >>>> /src/mat/examples/tests/ex1.c.
> >>>> For parallel, you need install PLAPACK.
> >>>>
> >>>> It seems we do not have MatMatSolve() for dense matrix format. I can
> >>>> add it if you need it.
> >>>>
> >>>> Hong
> >>>>
> >>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani <mgrabbani at gmail.com>
> >>>> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I need to use MatLUFactor(), but do not know what to pass for the
> last 3
> >>>>> arguments. Would you please explain a bit; I did not find any example
> >>>>> code
> >>>>> for this one.
> >>>>>
> >>>>> PetscErrorCode  MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo
> >>>>> *info)
> >>>>>
> >>>>> My matrix is a dense one and the result from this call will be used
> in
> >>>>> MatMatSolve().
> >>>>>
> >>>>> Regards,
> >>>>> Golam
> >>>>>
> >>>
> >>>
> >> <ex1.c>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110225/4f01770d/attachment.htm>

From frfoust at yahoo.com  Sat Feb 26 18:56:55 2011
From: frfoust at yahoo.com (F R Foust)
Date: Sat, 26 Feb 2011 16:56:55 -0800 (PST)
Subject: [petsc-users] Search order for BLAS
Message-ID: <739965.61334.qm@web130223.mail.mud.yahoo.com>

I was trying to build petsc using MKL blas/lapack and had a small issue.  I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead.  I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL).

Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did).  Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib?  I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against.

Thanks much!


From balay at mcs.anl.gov  Sat Feb 26 19:02:59 2011
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 26 Feb 2011 19:02:59 -0600 (CST)
Subject: [petsc-users] Search order for BLAS
In-Reply-To: <739965.61334.qm@web130223.mail.mud.yahoo.com>
References: <739965.61334.qm@web130223.mail.mud.yahoo.com>
Message-ID: <alpine.LFD.2.02.1102261900570.2480@localhost6.localdomain6>


I've changed the search order - so it should look for atlas last [as
it can usually be found in /usr/lib]. For non-system default packages
the search order shouldn't matter.

This change should be available in the next patch update to petsc-3.1

http://petsc.cs.iit.edu/petsc/releases/BuildSystem-3.1/rev/838c7bfa03e0

Satish

On Sat, 26 Feb 2011, F R Foust wrote:

> I was trying to build petsc using MKL blas/lapack and had a small issue.  I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead.  I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL).
> 
> Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did).  Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib?  I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against.
> 
> Thanks much!
> 
> 
>       
> 


From jed at 59A2.org  Sat Feb 26 19:03:37 2011
From: jed at 59A2.org (Jed Brown)
Date: Sun, 27 Feb 2011 02:03:37 +0100
Subject: [petsc-users] Search order for BLAS
In-Reply-To: <739965.61334.qm@web130223.mail.mud.yahoo.com>
References: <739965.61334.qm@web130223.mail.mud.yahoo.com>
Message-ID: <AANLkTi=tr02bLma1dUtP=R6Fy0oSUdXvMNh4S+nTP=1R@mail.gmail.com>

On Sun, Feb 27, 2011 at 01:56, F R Foust <frfoust at yahoo.com> wrote:

> I was trying to build petsc using MKL blas/lapack and had a small issue.  I
> tried to point configure at the mkl directory with --with-blas-lapack-dir,
> but it found an installation of ATLAS already in /usr/local/lib and used
> that instead.
>

Is that where your MKL is? If not, then this should be considered a bug, but
it's probably hard to fix because /usr/local/lib must be a system path which
is searched automatically.


>  I poked around and found a fixed search order in
> BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then
> MKL).
>
> Is there any way to force a specific flavor of BLAS in a flag passed to
> configure (I mean, so I don't have to modify BlasLapack.py, which is what I
> did).  Or alternatively, is there a way to force the issue by using
> --with-blas-lapack-lib, --with-blas-lib?  I wasn't able to figure out the
> correct incantation to include all of the stuff MKL needs to link against.
>

You can find what is necessary here, then put it in --with-blas-lapack-lib.

http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110227/f1626a07/attachment.htm>

From frfoust at yahoo.com  Sat Feb 26 19:08:30 2011
From: frfoust at yahoo.com (F R Foust)
Date: Sat, 26 Feb 2011 17:08:30 -0800 (PST)
Subject: [petsc-users] Search order for BLAS
In-Reply-To: <AANLkTi=tr02bLma1dUtP=R6Fy0oSUdXvMNh4S+nTP=1R@mail.gmail.com>
Message-ID: <915412.68663.qm@web130203.mail.mud.yahoo.com>

No, my copy of MKL is somewhere else, not in /usr/local/lib.? I understand that /usr/local/lib should probably be searched automatically.? Thanks for the link advisor reference -- that's enormously helpful. 

Looks like Satish just went ahead and changed the search order, so I guess it's moot now.? Thanks much, guys.

FR Foust

--- On Sat, 2/26/11, Jed Brown <jed at 59A2.org> wrote:
On Sun, Feb 27, 2011 at 01:56, F R Foust <frfoust at yahoo.com> wrote:

I was trying to build petsc using MKL blas/lapack and had a small issue. ?I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead. 

Is that where your MKL is? If not, then this should be considered a bug, but it's probably hard to fix because /usr/local/lib must be a system path which is searched automatically.
??I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL).


Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did). ?Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib? ?I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against.

You can find what is necessary here, then put it in --with-blas-lapack-lib.
http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110226/4f1e50f6/attachment.htm>

From bsmith at mcs.anl.gov  Sat Feb 26 19:36:50 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 26 Feb 2011 19:36:50 -0600
Subject: [petsc-users] Search order for BLAS
In-Reply-To: <AANLkTi=tr02bLma1dUtP=R6Fy0oSUdXvMNh4S+nTP=1R@mail.gmail.com>
References: <739965.61334.qm@web130223.mail.mud.yahoo.com>
	<AANLkTi=tr02bLma1dUtP=R6Fy0oSUdXvMNh4S+nTP=1R@mail.gmail.com>
Message-ID: <6031DF9F-322D-4269-8849-EFEDD724A988@mcs.anl.gov>


On Feb 26, 2011, at 7:03 PM, Jed Brown wrote:

> On Sun, Feb 27, 2011 at 01:56, F R Foust <frfoust at yahoo.com> wrote:
> I was trying to build petsc using MKL blas/lapack and had a small issue.  I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead.
> 
> Is that where your MKL is? If not, then this should be considered a bug, but it's probably hard to fix because /usr/local/lib must be a system path which is searched automatically.
>  
>  I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL).
> 
> Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did).  Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib?  I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against.
> 
> You can find what is necessary here, then put it in --with-blas-lapack-lib.
> 
> http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/

    Very powerful, but why does it only support a tiny number of versions?

   Barry


From jed at 59A2.org  Sat Feb 26 19:41:12 2011
From: jed at 59A2.org (Jed Brown)
Date: Sun, 27 Feb 2011 02:41:12 +0100
Subject: [petsc-users] Search order for BLAS
In-Reply-To: <6031DF9F-322D-4269-8849-EFEDD724A988@mcs.anl.gov>
References: <739965.61334.qm@web130223.mail.mud.yahoo.com>
	<AANLkTi=tr02bLma1dUtP=R6Fy0oSUdXvMNh4S+nTP=1R@mail.gmail.com>
	<6031DF9F-322D-4269-8849-EFEDD724A988@mcs.anl.gov>
Message-ID: <AANLkTi=YCjAjnR5-dv3epBMoG-k+vM3rQasqdN19yx0x@mail.gmail.com>

Why make your library so confusing you need an online calculator to find out
how to link it?

On Feb 27, 2011 2:37 AM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:


On Feb 26, 2011, at 7:03 PM, Jed Brown wrote:

> On Sun, Feb 27, 2011 at 01:56, F R Foust <frfoust at ...
   Very powerful, but why does it only support a tiny number of versions?

  Barry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110227/8d70855c/attachment.htm>

From gianmail at gmail.com  Mon Feb 28 04:46:45 2011
From: gianmail at gmail.com (Gianluca Meneghello)
Date: Mon, 28 Feb 2011 11:46:45 +0100
Subject: [petsc-users] Tridiagonal and pentadiagonal matrices
Message-ID: <AANLkTikMkcpJtYvR0QXztckmFQQDQLGmwgRKur9Ch1+Q@mail.gmail.com>

Hi,

I was looking for the best way to solve tridiagonal and pentadiagonal
matrix in Petsc. Is there a specific matrix format/solver for these
kind of systems I should use?

The tridiagonal/pentadiagonal matrix I have to solve corresponds to
the main 3/5 diagonals of a bigger matrix (if it can help, I'm trying
to solve a system using block-line Gauss Seidel). I've seen there is
an easy way to obtain the main diagonal of the matrix
(MatGetDiagonal). Is there an equivalent way to extract the other
data?

Thanks

Gianluca

-- 
"[Je pense que] l'homme est un monde qui vaut des fois les mondes et
que les plus ardentes ambitions sont celles qui ont eu l'orgueil de
l'Anonymat" -- Non omnibus, sed mihi et tibi
Amedeo Modigliani

From hzhang at mcs.anl.gov  Mon Feb 28 11:26:25 2011
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 28 Feb 2011 11:26:25 -0600
Subject: [petsc-users] Tridiagonal and pentadiagonal matrices
In-Reply-To: <AANLkTikMkcpJtYvR0QXztckmFQQDQLGmwgRKur9Ch1+Q@mail.gmail.com>
References: <AANLkTikMkcpJtYvR0QXztckmFQQDQLGmwgRKur9Ch1+Q@mail.gmail.com>
Message-ID: <AANLkTik0F2+i4duGUCEO5UPh637Od=Qus_Dds8DoyPgN@mail.gmail.com>

Gianluca :
>
> I was looking for the best way to solve tridiagonal and pentadiagonal
> matrix in Petsc. Is there a specific matrix format/solver for these
> kind of systems I should use?

No. For sequential matrix, you may use LAPACK routines for band or
tridiagonal matrices.
>
> The tridiagonal/pentadiagonal matrix I have to solve corresponds to
> the main 3/5 diagonals of a bigger matrix (if it can help, I'm trying
> to solve a system using block-line Gauss Seidel). I've seen there is
> an easy way to obtain the main diagonal of the matrix
> (MatGetDiagonal). Is there an equivalent way to extract the other
> data?

You may use MatGetSubMatrix(). For efficient assemble of your submatrix,
you may look into the private date structure (AIJ?) and obtain your submatrix.
For aij format, check aij.h or mpiaij.h for its datastructure.

Hong
>
> Thanks
>
> Gianluca
>
> --
> "[Je pense que] l'homme est un monde qui vaut des fois les mondes et
> que les plus ardentes ambitions sont celles qui ont eu l'orgueil de
> l'Anonymat" -- Non omnibus, sed mihi et tibi
> Amedeo Modigliani
>

From jdbst21 at gmail.com  Mon Feb 28 22:07:33 2011
From: jdbst21 at gmail.com (Joshua Booth)
Date: Mon, 28 Feb 2011 23:07:33 -0500
Subject: [petsc-users] Multiple Copies of KSP
Message-ID: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>

Hello,

I was wondering if it is possible to have multiple KSP running, each on
their own core but in the same program.
When I have tried this before using MPI_COMM_SELF,  I get an error.

Thank you

Joshua Booth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110228/846aa356/attachment.htm>

From balay at mcs.anl.gov  Mon Feb 28 22:11:05 2011
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 28 Feb 2011 22:11:05 -0600 (CST)
Subject: [petsc-users] Multiple Copies of KSP
In-Reply-To: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
References: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1102282209420.2480@localhost6.localdomain6>

yes you can create as many KSP solves as you need.

check src/ksp/ksp/examples/tutorials/ex9.c

satish

On Mon, 28 Feb 2011, Joshua Booth wrote:

> Hello,
> 
> I was wondering if it is possible to have multiple KSP running, each on
> their own core but in the same program.
> When I have tried this before using MPI_COMM_SELF,  I get an error.
> 
> Thank you
> 
> Joshua Booth
> 


From bsmith at mcs.anl.gov  Mon Feb 28 22:11:38 2011
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 28 Feb 2011 22:11:38 -0600
Subject: [petsc-users] Multiple Copies of KSP
In-Reply-To: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
References: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
Message-ID: <99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov>


On Feb 28, 2011, at 10:07 PM, Joshua Booth wrote:

> Hello,
> 
> I was wondering if it is possible to have multiple KSP running, each on their own core but in the same program.

  Yes. If you want a separate linear solve on each MPI process then you use MPI_COMM_SELF for the KSP, note that you also need to use that same MPI_COMM_SELF for creating the Vecs and Mats since each process needs its own for its particular linear system.

> When I have tried this before using MPI_COMM_SELF,  I get an error.

   We would need more information to determine what has gone wrong.

   Barry


> 
> Thank you
> 
> Joshua Booth


From jdbst21 at gmail.com  Mon Feb 28 22:16:05 2011
From: jdbst21 at gmail.com (Joshua Booth)
Date: Mon, 28 Feb 2011 23:16:05 -0500
Subject: [petsc-users] Multiple Copies of KSP
In-Reply-To: <alpine.LFD.2.02.1102282209420.2480@localhost6.localdomain6>
References: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
	<alpine.LFD.2.02.1102282209420.2480@localhost6.localdomain6>
Message-ID: <AANLkTik_xzScsfwrPoP8KURxW67AGqR5G3hSBX1ocnfH@mail.gmail.com>

I meant that each core was running its own ksp at the same time. Not two
different solvers over the same world.


On Mon, Feb 28, 2011 at 11:11 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> yes you can create as many KSP solves as you need.
>
> check src/ksp/ksp/examples/tutorials/ex9.c
>
> satish
>
> On Mon, 28 Feb 2011, Joshua Booth wrote:
>
> > Hello,
> >
> > I was wondering if it is possible to have multiple KSP running, each on
> > their own core but in the same program.
> > When I have tried this before using MPI_COMM_SELF,  I get an error.
> >
> > Thank you
> >
> > Joshua Booth
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110228/37b3ba3f/attachment.htm>

From jdbst21 at gmail.com  Mon Feb 28 22:19:15 2011
From: jdbst21 at gmail.com (Joshua Booth)
Date: Mon, 28 Feb 2011 23:19:15 -0500
Subject: [petsc-users] Multiple Copies of KSP
In-Reply-To: <99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov>
References: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
	<99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov>
Message-ID: <AANLkTinergTdKSLAcpJqHOP6ob683smQesXSqsmT2npu@mail.gmail.com>

Hello,

In reply,

I get a Segment fault even though I only call:
KSP;
KSPCreate(MPI_COMM_SELF, &ksp);

Again... note that this is being done by all cores in the MPI_COMM_WORLD at
the same time.

On Mon, Feb 28, 2011 at 11:11 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Feb 28, 2011, at 10:07 PM, Joshua Booth wrote:
>
> > Hello,
> >
> > I was wondering if it is possible to have multiple KSP running, each on
> their own core but in the same program.
>
>   Yes. If you want a separate linear solve on each MPI process then you use
> MPI_COMM_SELF for the KSP, note that you also need to use that same
> MPI_COMM_SELF for creating the Vecs and Mats since each process needs its
> own for its particular linear system.
>
> > When I have tried this before using MPI_COMM_SELF,  I get an error.
>
>    We would need more information to determine what has gone wrong.
>
>   Barry
>
>
> >
> > Thank you
> >
> > Joshua Booth
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110228/18cd686a/attachment.htm>

From balay at mcs.anl.gov  Mon Feb 28 23:12:32 2011
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 28 Feb 2011 23:12:32 -0600 (CST)
Subject: [petsc-users] Multiple Copies of KSP
In-Reply-To: <AANLkTinergTdKSLAcpJqHOP6ob683smQesXSqsmT2npu@mail.gmail.com>
References: <AANLkTikFfBo4ooox2mA_2ez32a03i31tYzKraSYNUGzX@mail.gmail.com>
	<99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov>
	<AANLkTinergTdKSLAcpJqHOP6ob683smQesXSqsmT2npu@mail.gmail.com>
Message-ID: <alpine.LFD.2.02.1102282311510.13523@localhost6.localdomain6>

It should work. you'll have to use gdb to determine exact location and
reason for this crash in your code.

satish

On Mon, 28 Feb 2011, Joshua Booth wrote:

> Hello,
> 
> In reply,
> 
> I get a Segment fault even though I only call:
> KSP;
> KSPCreate(MPI_COMM_SELF, &ksp);
> 
> Again... note that this is being done by all cores in the MPI_COMM_WORLD at
> the same time.
> 
> On Mon, Feb 28, 2011 at 11:11 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> >
> > On Feb 28, 2011, at 10:07 PM, Joshua Booth wrote:
> >
> > > Hello,
> > >
> > > I was wondering if it is possible to have multiple KSP running, each on
> > their own core but in the same program.
> >
> >   Yes. If you want a separate linear solve on each MPI process then you use
> > MPI_COMM_SELF for the KSP, note that you also need to use that same
> > MPI_COMM_SELF for creating the Vecs and Mats since each process needs its
> > own for its particular linear system.
> >
> > > When I have tried this before using MPI_COMM_SELF,  I get an error.
> >
> >    We would need more information to determine what has gone wrong.
> >
> >   Barry
> >
> >
> > >
> > > Thank you
> > >
> > > Joshua Booth
> >
> >
>