From hung.thanh.nguyen at petrell.no Tue Feb 1 08:01:43 2011 From: hung.thanh.nguyen at petrell.no (Hung Thanh Nguyen) Date: Tue, 01 Feb 2011 15:01:43 +0100 Subject: [petsc-users] (no subject) Message-ID: I am trying to install PETSc on Windows. I am using Intel C compiler and Intel MKL. I will use whichever MPI library is most appropriate (MPICH?). I have tried to follow the installation instructions: installed cygwin and try to run configure in the cygwin shell. But I get some error regarding some python code or compiler ( not sure which). My main question is: Does anybode have a more detailed installation instruction for PETSc on windows systems? My goal is to have PETSc loaded as a library in to my visual studio project. Alternatively: Should I post the error message I get when I try to configure and take it from there? Any help would be much appreciated! Hung Research in matrix solvers -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 1 08:09:51 2011 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Feb 2011 08:09:51 -0600 Subject: [petsc-users] (no subject) In-Reply-To: References: Message-ID: On Tue, Feb 1, 2011 at 8:01 AM, Hung Thanh Nguyen < hung.thanh.nguyen at petrell.no> wrote: > I am trying to install PETSc on Windows. I am using Intel C compiler and > Intel MKL. I will use whichever MPI library is most appropriate (MPICH?). > > > > I have tried to follow the installation instructions: installed cygwin and > try to run configure in the cygwin shell. But I get some error regarding > some python code or compiler ( not sure which). > > > > My main question is: Does anybode have a more detailed installation > instruction for PETSc on windows systems? My goal is to have PETSc loaded as > a library in to my visual studio project. > > > > Alternatively: Should I post the error message I get when I try to > configure and take it from there? > Yes, please send the entire error message to petsc-maint at mcs.anl.gov. Matt > > > Any help would be much appreciated! > > Hung > > Research in matrix solvers > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhender at us.ibm.com Tue Feb 1 09:03:51 2011 From: mhender at us.ibm.com (Michael E Henderson) Date: Tue, 1 Feb 2011 10:03:51 -0500 Subject: [petsc-users] options for gmres w/ ilu precondioner for sparse Message-ID: Good morning, I'm trying to use gmres with an ilu preconditioner and having trouble getting the options right. I figure it's got to be something simple, so hope it's an easy question. With options: -ksp_type gmres -pc_type ilu -pc_factor_levels 10 -pc_factor_fill 10 -pc_factor_mat_solver_package spai I get the message: unknown: [1|MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix format mpiaij does not have a solver spai. Perhaps you must config/configure.py with --download-spai I checked the configuration output and spai was indeed configured and built. I also tried spooles with a similar result. The table http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html seems to be saying that only hypre/euclid can be used for ilu(k) w/ aij. Is that true? -pc_factor_mat_solver_package hypre -pc_hypre_type euclid also gives unknown: [1MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix format mpiaij does not have a solver hypre. Perhaps you must config/configure.py with --download-hypre I'm using hypre as a preconditioer elsewhere, so I'm sure it's installed. Am I doing something obviously wrong? Thanks, Mike Henderson ------------------------------------------------------------------------------------------------------------------------------------ Mathematical Sciences, TJ Watson Research Center mhender at watson.ibm.com http://www.research.ibm.com/people/h/henderson/ http://multifario.sourceforge.net/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 1 09:44:38 2011 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Feb 2011 09:44:38 -0600 Subject: [petsc-users] options for gmres w/ ilu precondioner for sparse In-Reply-To: References: Message-ID: On Tue, Feb 1, 2011 at 9:03 AM, Michael E Henderson wrote: > Good morning, > > I'm trying to use gmres with an ilu preconditioner and having trouble > getting the options right. I figure it's got to be something simple, so hope > it's an easy question. > This is the joy of using other packages. Looking at the source, I see 4 packages which can factor a parallel (MPIAIJ) matrix: MUMPS, SuperLU_dist, Spooles (now unsupported), Pastix These are all usable from -pc_factor_mat_solver_package when using MPIAIJ. 1) We do not consider SPAI a matrix factorization package. You just use it with --pc_type spai 2) We cannot see inside Hypre, and thus it is hard to get into this framework. You use Euclid with -pc_type hypre -pc_hypre_type euclid Matt > With options: > > -ksp_type gmres > -pc_type ilu > -pc_factor_levels 10 > -pc_factor_fill 10 > -pc_factor_mat_solver_package spai > > I get the message: > > unknown: [1|MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix > format mpiaij does not have a solver spai. Perhaps you must > config/configure.py with --download-spai > > I checked the configuration output and spai was indeed configured and > built. I also tried spooles with a similar result. > > The table > http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.htmlseems to be saying that only hypre/euclid can be used for ilu(k) w/ aij. Is > that true? > > -pc_factor_mat_solver_package hypre > -pc_hypre_type euclid > > also gives > > unknown: [1MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix > format mpiaij does not have a solver hypre. Perhaps you must > config/configure.py with --download-hypre > > I'm using hypre as a preconditioer elsewhere, so I'm sure it's installed. > Am I doing something obviously wrong? > > Thanks, > > Mike Henderson > > ------------------------------------------------------------------------------------------------------------------------------------ > Mathematical Sciences, TJ Watson Research Center > mhender at watson.ibm.com > http://www.research.ibm.com/people/h/henderson/ > http://multifario.sourceforge.net/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhender at us.ibm.com Tue Feb 1 11:14:38 2011 From: mhender at us.ibm.com (Michael E Henderson) Date: Tue, 1 Feb 2011 12:14:38 -0500 Subject: [petsc-users] options for gmres w/ ilu precondioner for sparse In-Reply-To: References: Message-ID: Thanks ------------------------------------------------------------------------------------------------------------------------------------ Mathematical Sciences, TJ Watson Research Center mhender at watson.ibm.com http://www.research.ibm.com/people/h/henderson/ http://multifario.sourceforge.net/ From: Matthew Knepley To: PETSc users list Date: 02/01/2011 10:44 AM Subject: Re: [petsc-users] options for gmres w/ ilu precondioner for sparse Sent by: petsc-users-bounces at mcs.anl.gov On Tue, Feb 1, 2011 at 9:03 AM, Michael E Henderson wrote: Good morning, I'm trying to use gmres with an ilu preconditioner and having trouble getting the options right. I figure it's got to be something simple, so hope it's an easy question. This is the joy of using other packages. Looking at the source, I see 4 packages which can factor a parallel (MPIAIJ) matrix: MUMPS, SuperLU_dist, Spooles (now unsupported), Pastix These are all usable from -pc_factor_mat_solver_package when using MPIAIJ. 1) We do not consider SPAI a matrix factorization package. You just use it with --pc_type spai 2) We cannot see inside Hypre, and thus it is hard to get into this framework. You use Euclid with -pc_type hypre -pc_hypre_type euclid Matt With options: -ksp_type gmres -pc_type ilu -pc_factor_levels 10 -pc_factor_fill 10 -pc_factor_mat_solver_package spai I get the message: unknown: [1|MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix format mpiaij does not have a solver spai. Perhaps you must config/configure.py with --download-spai I checked the configuration output and spai was indeed configured and built. I also tried spooles with a similar result. The table http://www.mcs.anl.gov/petsc/petsc-as/documentation/linearsolvertable.html seems to be saying that only hypre/euclid can be used for ilu(k) w/ aij. Is that true? -pc_factor_mat_solver_package hypre -pc_hypre_type euclid also gives unknown: [1MatGetFactor() line 3646 in src/mat/interface/matrix.c: Matrix format mpiaij does not have a solver hypre. Perhaps you must config/configure.py with --download-hypre I'm using hypre as a preconditioer elsewhere, so I'm sure it's installed. Am I doing something obviously wrong? Thanks, Mike Henderson ------------------------------------------------------------------------------------------------------------------------------------ Mathematical Sciences, TJ Watson Research Center mhender at watson.ibm.com http://www.research.ibm.com/people/h/henderson/ http://multifario.sourceforge.net/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From amesga1 at tigers.lsu.edu Tue Feb 1 14:14:32 2011 From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad) Date: Tue, 1 Feb 2011 14:14:32 -0600 Subject: [petsc-users] KspTrueResidualNorm Message-ID: Dear all, I'm using Ksp *gmres* with *boomeramg* preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand *KSPSetNormType* gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test? Best, A. Mesgarnejad -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Feb 1 14:20:28 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Tue, 1 Feb 2011 14:20:28 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: Message-ID: You can use runtime option '-ksp_monitor_true_residual'. Hong On Tue, Feb 1, 2011 at 2:14 PM, Ataollah Mesgarnejad wrote: > Dear all, > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that even > though it converges in preconditioned norm it doesn't converge in true norm, > but as I understand KSPSetNormType gmres does not support true residual > norm? Is that correct and If it is, is there any other way to monitor the > true norm? Do I need to Introduce my own convergence test? > > Best, > A. Mesgarnejad > From bsmith at mcs.anl.gov Tue Feb 1 14:21:49 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 1 Feb 2011 14:21:49 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: Message-ID: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> The simplest thing is to simply run with -ksp_monitor_true_residual and see how the convergence is going in the true residual norm also. You can also switch to right preconditioning with gmres and then the residual used by gmres is the true residual norm. Use -ksp_preconditioner_side right -ksp_norm_type unpreconditioned Barry On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: > Dear all, > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand KSPSetNormType gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test? > > Best, > A. Mesgarnejad From amesga1 at tigers.lsu.edu Tue Feb 1 14:32:17 2011 From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad) Date: Tue, 1 Feb 2011 14:32:17 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> Message-ID: Barry, I did as you said and now I receive these errors once I try to run the program: [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp Does right preconditiong work with *HYPRE*? Best, Ata On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: > > The simplest thing is to simply run with -ksp_monitor_true_residual and > see how the convergence is going in the true residual norm also. > > You can also switch to right preconditioning with gmres and then the > residual used by gmres is the true residual norm. Use > -ksp_preconditioner_side right -ksp_norm_type unpreconditioned > > Barry > > > > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: > > > Dear all, > > > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that > even though it converges in preconditioned norm it doesn't converge in true > norm, but as I understand KSPSetNormType gmres does not support true > residual norm? Is that correct and If it is, is there any other way to > monitor the true norm? Do I need to Introduce my own convergence test? > > > > Best, > > A. Mesgarnejad > > -- A. Mesgarnejad PhD Student, Research Assistant Mechanical Engineering Department Louisiana State University 2203 Patrick F. Taylor Hall Baton Rouge, La 70803 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Feb 1 14:35:41 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 1 Feb 2011 14:35:41 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> Message-ID: <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov> Right preconditioner shouldn't matter to the preconditioner at all. What is the complete error message. Does it run on one process? Barry On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote: > Barry, > > I did as you said and now I receive these errors once I try to run the program: > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > Does right preconditiong work with HYPRE? > Best, > Ata > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: > > The simplest thing is to simply run with -ksp_monitor_true_residual and see how the convergence is going in the true residual norm also. > > You can also switch to right preconditioning with gmres and then the residual used by gmres is the true residual norm. Use -ksp_preconditioner_side right -ksp_norm_type unpreconditioned > > Barry > > > > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: > > > Dear all, > > > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand KSPSetNormType gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test? > > > > Best, > > A. Mesgarnejad > > > > > -- > A. Mesgarnejad > PhD Student, Research Assistant > Mechanical Engineering Department > Louisiana State University > 2203 Patrick F. Taylor Hall > Baton Rouge, La 70803 From amesga1 at tigers.lsu.edu Tue Feb 1 14:35:38 2011 From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad) Date: Tue, 1 Feb 2011 14:35:38 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> Message-ID: Sorry for too many replies but it worked with -Ksp_monitor !!! I reported the same kind of problem before too. Best, Ata On Tue, Feb 1, 2011 at 2:32 PM, Ataollah Mesgarnejad wrote: > Barry, > > I did as you said and now I receive these errors once I try to run the > program: > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > Does right preconditiong work with *HYPRE*? > Best, > Ata > > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: > >> >> The simplest thing is to simply run with -ksp_monitor_true_residual and >> see how the convergence is going in the true residual norm also. >> >> You can also switch to right preconditioning with gmres and then the >> residual used by gmres is the true residual norm. Use >> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned >> >> Barry >> >> >> >> On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: >> >> > Dear all, >> > >> > I'm using Ksp gmres with boomeramg preconditioning and I suspect that >> even though it converges in preconditioned norm it doesn't converge in true >> norm, but as I understand KSPSetNormType gmres does not support true >> residual norm? Is that correct and If it is, is there any other way to >> monitor the true norm? Do I need to Introduce my own convergence test? >> > >> > Best, >> > A. Mesgarnejad >> >> > > > -- > A. Mesgarnejad > PhD Student, Research Assistant > Mechanical Engineering Department > Louisiana State University > 2203 Patrick F. Taylor Hall > Baton Rouge, La 70803 > -- A. Mesgarnejad PhD Student, Research Assistant Mechanical Engineering Department Louisiana State University 2203 Patrick F. Taylor Hall Baton Rouge, La 70803 -------------- next part -------------- An HTML attachment was scrubbed... URL: From amesga1 at tigers.lsu.edu Tue Feb 1 14:35:38 2011 From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad) Date: Tue, 1 Feb 2011 14:35:38 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> Message-ID: Sorry for too many replies but it worked with -Ksp_monitor !!! I reported the same kind of problem before too. Best, Ata On Tue, Feb 1, 2011 at 2:32 PM, Ataollah Mesgarnejad wrote: > Barry, > > I did as you said and now I receive these errors once I try to run the > program: > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > Does right preconditiong work with *HYPRE*? > Best, > Ata > > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: > >> >> The simplest thing is to simply run with -ksp_monitor_true_residual and >> see how the convergence is going in the true residual norm also. >> >> You can also switch to right preconditioning with gmres and then the >> residual used by gmres is the true residual norm. Use >> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned >> >> Barry >> >> >> >> On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: >> >> > Dear all, >> > >> > I'm using Ksp gmres with boomeramg preconditioning and I suspect that >> even though it converges in preconditioned norm it doesn't converge in true >> norm, but as I understand KSPSetNormType gmres does not support true >> residual norm? Is that correct and If it is, is there any other way to >> monitor the true norm? Do I need to Introduce my own convergence test? >> > >> > Best, >> > A. Mesgarnejad >> >> > > > -- > A. Mesgarnejad > PhD Student, Research Assistant > Mechanical Engineering Department > Louisiana State University > 2203 Patrick F. Taylor Hall > Baton Rouge, La 70803 > -- A. Mesgarnejad PhD Student, Research Assistant Mechanical Engineering Department Louisiana State University 2203 Patrick F. Taylor Hall Baton Rouge, La 70803 -------------- next part -------------- An HTML attachment was scrubbed... URL: From amesga1 at tigers.lsu.edu Tue Feb 1 15:27:03 2011 From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad) Date: Tue, 1 Feb 2011 15:27:03 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov> References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov> Message-ID: I compile it with openmpi and it runs on 5 cores. And thats all the error message from PETSC. the compelete output looks like this: [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 3 with PID 3845 on node me-1203svr3.lsu.edu exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). Best, Ata On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith wrote: > > Right preconditioner shouldn't matter to the preconditioner at all. What > is the complete error message. Does it run on one process? > > Barry > > On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote: > > > Barry, > > > > I did as you said and now I receive these errors once I try to run the > program: > > > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in > src/dm/da/utils/mhyp.c > > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in > src/ksp/pc/impls/hypre/hypre.c > > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in > src/dm/da/utils/mhyp.c > > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in > src/ksp/pc/impls/hypre/hypre.c > > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > > > Does right preconditiong work with HYPRE? > > Best, > > Ata > > > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: > > > > The simplest thing is to simply run with -ksp_monitor_true_residual and > see how the convergence is going in the true residual norm also. > > > > You can also switch to right preconditioning with gmres and then the > residual used by gmres is the true residual norm. Use > -ksp_preconditioner_side right -ksp_norm_type unpreconditioned > > > > Barry > > > > > > > > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: > > > > > Dear all, > > > > > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that > even though it converges in preconditioned norm it doesn't converge in true > norm, but as I understand KSPSetNormType gmres does not support true > residual norm? Is that correct and If it is, is there any other way to > monitor the true norm? Do I need to Introduce my own convergence test? > > > > > > Best, > > > A. Mesgarnejad > > > > > > > > > > -- > > A. Mesgarnejad > > PhD Student, Research Assistant > > Mechanical Engineering Department > > Louisiana State University > > 2203 Patrick F. Taylor Hall > > Baton Rouge, La 70803 > > -- A. Mesgarnejad PhD Student, Research Assistant Mechanical Engineering Department Louisiana State University 2203 Patrick F. Taylor Hall Baton Rouge, La 70803 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 1 15:32:22 2011 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Feb 2011 15:32:22 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov> Message-ID: On Tue, Feb 1, 2011 at 3:27 PM, Ataollah Mesgarnejad wrote: > I compile it with openmpi and it runs on 5 cores. And thats all the error > message from PETSC. the compelete output looks like this: > Something is wrong with your output gathering. It would indicate the error, or a signal received. Matt > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD > with errorcode 1. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun has exited due to process rank 3 with PID 3845 on > node me-1203svr3.lsu.edu exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > Best, > Ata > > On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith wrote: > >> >> Right preconditioner shouldn't matter to the preconditioner at all. What >> is the complete error message. Does it run on one process? >> >> Barry >> >> On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote: >> >> > Barry, >> > >> > I did as you said and now I receive these errors once I try to run the >> program: >> > >> > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in >> src/dm/da/utils/mhyp.c >> > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in >> src/ksp/pc/impls/hypre/hypre.c >> > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c >> > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c >> > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c >> > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp >> > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in >> src/dm/da/utils/mhyp.c >> > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in >> src/ksp/pc/impls/hypre/hypre.c >> > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c >> > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c >> > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp >> > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c >> > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp >> > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp >> > >> > Does right preconditiong work with HYPRE? >> > Best, >> > Ata >> > >> > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: >> > >> > The simplest thing is to simply run with -ksp_monitor_true_residual >> and see how the convergence is going in the true residual norm also. >> > >> > You can also switch to right preconditioning with gmres and then the >> residual used by gmres is the true residual norm. Use >> -ksp_preconditioner_side right -ksp_norm_type unpreconditioned >> > >> > Barry >> > >> > >> > >> > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: >> > >> > > Dear all, >> > > >> > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that >> even though it converges in preconditioned norm it doesn't converge in true >> norm, but as I understand KSPSetNormType gmres does not support true >> residual norm? Is that correct and If it is, is there any other way to >> monitor the true norm? Do I need to Introduce my own convergence test? >> > > >> > > Best, >> > > A. Mesgarnejad >> > >> > >> > >> > >> > -- >> > A. Mesgarnejad >> > PhD Student, Research Assistant >> > Mechanical Engineering Department >> > Louisiana State University >> > 2203 Patrick F. Taylor Hall >> > Baton Rouge, La 70803 >> >> > > > -- > A. Mesgarnejad > PhD Student, Research Assistant > Mechanical Engineering Department > Louisiana State University > 2203 Patrick F. Taylor Hall > Baton Rouge, La 70803 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Feb 1 15:34:45 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 1 Feb 2011 15:34:45 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov> Message-ID: <98C79BFE-D40E-4502-8A0B-336DD60098E7@mcs.anl.gov> Suggest running with valgrind http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to see if there is some memory corruption. If that doesn't help suggest a build with MPICH to see if the same or a different problem happens. Getting a partial error message like this is not normal and rarely seen. Barry On Feb 1, 2011, at 3:32 PM, Matthew Knepley wrote: > On Tue, Feb 1, 2011 at 3:27 PM, Ataollah Mesgarnejad wrote: > I compile it with openmpi and it runs on 5 cores. And thats all the error message from PETSC. the compelete output looks like this: > > Something is wrong with your output gathering. It would indicate the error, or a signal received. > > Matt > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD > with errorcode 1. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun has exited due to process rank 3 with PID 3845 on > node me-1203svr3.lsu.edu exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > Best, > Ata > > On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith wrote: > > Right preconditioner shouldn't matter to the preconditioner at all. What is the complete error message. Does it run on one process? > > Barry > > On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote: > > > Barry, > > > > I did as you said and now I receive these errors once I try to run the program: > > > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c > > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c > > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > > > Does right preconditiong work with HYPRE? > > Best, > > Ata > > > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith wrote: > > > > The simplest thing is to simply run with -ksp_monitor_true_residual and see how the convergence is going in the true residual norm also. > > > > You can also switch to right preconditioning with gmres and then the residual used by gmres is the true residual norm. Use -ksp_preconditioner_side right -ksp_norm_type unpreconditioned > > > > Barry > > > > > > > > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: > > > > > Dear all, > > > > > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that even though it converges in preconditioned norm it doesn't converge in true norm, but as I understand KSPSetNormType gmres does not support true residual norm? Is that correct and If it is, is there any other way to monitor the true norm? Do I need to Introduce my own convergence test? > > > > > > Best, > > > A. Mesgarnejad > > > > > > > > > > -- > > A. Mesgarnejad > > PhD Student, Research Assistant > > Mechanical Engineering Department > > Louisiana State University > > 2203 Patrick F. Taylor Hall > > Baton Rouge, La 70803 > > > > > -- > A. Mesgarnejad > PhD Student, Research Assistant > Mechanical Engineering Department > Louisiana State University > 2203 Patrick F. Taylor Hall > Baton Rouge, La 70803 > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From amesga1 at tigers.lsu.edu Tue Feb 1 15:41:17 2011 From: amesga1 at tigers.lsu.edu (Ataollah Mesgarnejad) Date: Tue, 1 Feb 2011 15:41:17 -0600 Subject: [petsc-users] KspTrueResidualNorm In-Reply-To: <98C79BFE-D40E-4502-8A0B-336DD60098E7@mcs.anl.gov> References: <505C9453-8F10-4F38-AA2A-A05C8252C107@mcs.anl.gov> <4F99FCD7-A7E0-4516-9A4E-A73C7F1F69A8@mcs.anl.gov> <98C79BFE-D40E-4502-8A0B-336DD60098E7@mcs.anl.gov> Message-ID: I already used Valgrind to check for memory and as far I can tell the program was working fine. At this stage my concern is mainly about the convergence of ksp solver. I can later try compiling with petsc-dev. Thank you for your time, Ata On Tue, Feb 1, 2011 at 3:34 PM, Barry Smith wrote: > > Suggest running with valgrind > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to > see if there is some memory corruption. If that doesn't help suggest a build > with MPICH to see if the same or a different problem happens. > > Getting a partial error message like this is not normal and rarely seen. > > Barry > > On Feb 1, 2011, at 3:32 PM, Matthew Knepley wrote: > > > On Tue, Feb 1, 2011 at 3:27 PM, Ataollah Mesgarnejad < > amesga1 at tigers.lsu.edu> wrote: > > I compile it with openmpi and it runs on 5 cores. And thats all the error > message from PETSC. the compelete output looks like this: > > > > Something is wrong with your output gathering. It would indicate the > error, or a signal received. > > > > Matt > > > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in > src/dm/da/utils/mhyp.c > > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in > src/ksp/pc/impls/hypre/hypre.c > > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in > src/dm/da/utils/mhyp.c > > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in > src/ksp/pc/impls/hypre/hypre.c > > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > > -------------------------------------------------------------------------- > > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD > > with errorcode 1. > > > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > > You may or may not see output from other processes, depending on > > exactly when Open MPI kills them. > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > mpirun has exited due to process rank 3 with PID 3845 on > > node me-1203svr3.lsu.edu exiting without calling "finalize". This may > > have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here). > > > > Best, > > Ata > > > > On Tue, Feb 1, 2011 at 2:35 PM, Barry Smith wrote: > > > > Right preconditioner shouldn't matter to the preconditioner at all. What > is the complete error message. Does it run on one process? > > > > Barry > > > > On Feb 1, 2011, at 2:32 PM, Ataollah Mesgarnejad wrote: > > > > > Barry, > > > > > > I did as you said and now I receive these errors once I try to run the > program: > > > > > > [3]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in > src/dm/da/utils/mhyp.c > > > [3]PETSC ERROR: PCSetUp_HYPRE() line 112 in > src/ksp/pc/impls/hypre/hypre.c > > > [3]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > > [3]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > > [3]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > > [3]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > > [4]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in > src/dm/da/utils/mhyp.c > > > [4]PETSC ERROR: PCSetUp_HYPRE() line 112 in > src/ksp/pc/impls/hypre/hypre.c > > > [4]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c > > > [4]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > > > [3]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > > [4]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > > > [4]PETSC ERROR: UStep() line 511 in PFMAT-FD.cpp > > > [4]PETSC ERROR: main() line 95 in PFMAT-main.cpp > > > > > > Does right preconditiong work with HYPRE? > > > Best, > > > Ata > > > > > > On Tue, Feb 1, 2011 at 2:21 PM, Barry Smith > wrote: > > > > > > The simplest thing is to simply run with -ksp_monitor_true_residual > and see how the convergence is going in the true residual norm also. > > > > > > You can also switch to right preconditioning with gmres and then the > residual used by gmres is the true residual norm. Use > -ksp_preconditioner_side right -ksp_norm_type unpreconditioned > > > > > > Barry > > > > > > > > > > > > On Feb 1, 2011, at 2:14 PM, Ataollah Mesgarnejad wrote: > > > > > > > Dear all, > > > > > > > > I'm using Ksp gmres with boomeramg preconditioning and I suspect that > even though it converges in preconditioned norm it doesn't converge in true > norm, but as I understand KSPSetNormType gmres does not support true > residual norm? Is that correct and If it is, is there any other way to > monitor the true norm? Do I need to Introduce my own convergence test? > > > > > > > > Best, > > > > A. Mesgarnejad > > > > > > > > > > > > > > > -- > > > A. Mesgarnejad > > > PhD Student, Research Assistant > > > Mechanical Engineering Department > > > Louisiana State University > > > 2203 Patrick F. Taylor Hall > > > Baton Rouge, La 70803 > > > > > > > > > > -- > > A. Mesgarnejad > > PhD Student, Research Assistant > > Mechanical Engineering Department > > Louisiana State University > > 2203 Patrick F. Taylor Hall > > Baton Rouge, La 70803 > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > -- A. Mesgarnejad PhD Student, Research Assistant Mechanical Engineering Department Louisiana State University 2203 Patrick F. Taylor Hall Baton Rouge, La 70803 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Wed Feb 2 16:46:29 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 2 Feb 2011 16:46:29 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core Message-ID: Hi, I am trying to configure my petsc install with an MPI installation to make use of a dual quad-core desktop system running Ubuntu. But eventhough the configure/make process went through without problems, the scalability of the programs don't seem to reflect what I expected. My configure options are --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ --download-plapack=1 --download-mumps=1 --download-umfpack=yes --with-debugging=1 --with-errorchecking=yes Is there something else that needs to be done as part of the configure process to enable a decent scaling ? I am only comparing programs with mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the same time as noted from -log_summary. If it helps, I've been testing with snes/examples/tutorials/ex20.c for all purposes with a custom -grid parameter from command-line to control the number of unknowns. If there is something you've witnessed before in this configuration or if you need anything else to analyze the problem, do let me know. Thanks, Vijay From knepley at gmail.com Wed Feb 2 16:53:32 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 2 Feb 2011 16:53:32 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan wrote: > Hi, > > I am trying to configure my petsc install with an MPI installation to > make use of a dual quad-core desktop system running Ubuntu. But > eventhough the configure/make process went through without problems, > the scalability of the programs don't seem to reflect what I expected. > My configure options are > > --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 > --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g > --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 > --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ > --download-plapack=1 --download-mumps=1 --download-umfpack=yes > --with-debugging=1 --with-errorchecking=yes > 1) For performance studies, make a build using --with-debugging=0 2) Look at -log_summary for a breakdown of performance Matt > Is there something else that needs to be done as part of the configure > process to enable a decent scaling ? I am only comparing programs with > mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the > same time as noted from -log_summary. If it helps, I've been testing > with snes/examples/tutorials/ex20.c for all purposes with a custom > -grid parameter from command-line to control the number of unknowns. > > If there is something you've witnessed before in this configuration or > if you need anything else to analyze the problem, do let me know. > > Thanks, > Vijay > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Wed Feb 2 17:04:31 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 2 Feb 2011 17:04:31 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: Matt, The -with-debugging=1 option is certainly not meant for performance studies but I didn't expect it to yield the same cpu time as a single processor for snes/ex20 i.e., my runs with 1 and 2 processors take approximately the same amount of time for computation of solution. But I am currently configuring without debugging symbols and shall let you know what that yields. On a similar note, is there something extra that needs to be done to make use of multi-core machines while using MPI ? I am not sure if this is even related to PETSc but could be an MPI configuration option that maybe either I or the configure process is missing. All ideas are much appreciated. Vijay On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: > On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan > wrote: >> >> Hi, >> >> I am trying to configure my petsc install with an MPI installation to >> make use of a dual quad-core desktop system running Ubuntu. But >> eventhough the configure/make process went through without problems, >> the scalability of the programs don't seem to reflect what I expected. >> My configure options are >> >> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >> --with-debugging=1 --with-errorchecking=yes > > 1) For performance studies, make a build using --with-debugging=0 > 2) Look at -log_summary for a breakdown of performance > ?? Matt > >> >> Is there something else that needs to be done as part of the configure >> process to enable a decent scaling ? I am only comparing programs with >> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >> same time as noted from -log_summary. If it helps, I've been testing >> with snes/examples/tutorials/ex20.c for all purposes with a custom >> -grid parameter from command-line to control the number of unknowns. >> >> If there is something you've witnessed before in this configuration or >> if you need anything else to analyze the problem, do let me know. >> >> Thanks, >> Vijay > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > From knepley at gmail.com Wed Feb 2 17:15:07 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 2 Feb 2011 17:15:07 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan wrote: > Matt, > > The -with-debugging=1 option is certainly not meant for performance > studies but I didn't expect it to yield the same cpu time as a single > processor for snes/ex20 i.e., my runs with 1 and 2 processors take > approximately the same amount of time for computation of solution. But > I am currently configuring without debugging symbols and shall let you > know what that yields. > > On a similar note, is there something extra that needs to be done to > make use of multi-core machines while using MPI ? I am not sure if > this is even related to PETSc but could be an MPI configuration option > that maybe either I or the configure process is missing. All ideas are > much appreciated. Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most cheap multicore machines, there is a single memory bus, and thus using more cores gains you very little extra performance. I still suspect you are not actually running in parallel, because you usually see a small speedup. That is why I suggested looking at -log_summary since it tells you how many processes were run and breaks down the time. Matt > > Vijay > > On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: > > On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan > > wrote: > >> > >> Hi, > >> > >> I am trying to configure my petsc install with an MPI installation to > >> make use of a dual quad-core desktop system running Ubuntu. But > >> eventhough the configure/make process went through without problems, > >> the scalability of the programs don't seem to reflect what I expected. > >> My configure options are > >> > >> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 > >> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g > >> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 > >> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ > >> --download-plapack=1 --download-mumps=1 --download-umfpack=yes > >> --with-debugging=1 --with-errorchecking=yes > > > > 1) For performance studies, make a build using --with-debugging=0 > > 2) Look at -log_summary for a breakdown of performance > > Matt > > > >> > >> Is there something else that needs to be done as part of the configure > >> process to enable a decent scaling ? I am only comparing programs with > >> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the > >> same time as noted from -log_summary. If it helps, I've been testing > >> with snes/examples/tutorials/ex20.c for all purposes with a custom > >> -grid parameter from command-line to control the number of unknowns. > >> > >> If there is something you've witnessed before in this configuration or > >> if you need anything else to analyze the problem, do let me know. > >> > >> Thanks, > >> Vijay > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Wed Feb 2 17:38:00 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 2 Feb 2011 17:38:00 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: Here's the performance statistic on 1 and 2 processor runs. /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary Max Max/Min Avg Total Time (sec): 8.452e+00 1.00000 8.452e+00 Objects: 1.470e+02 1.00000 1.470e+02 Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 4.440e+02 1.00000 /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary Max Max/Min Avg Total Time (sec): 7.851e+00 1.00000 7.851e+00 Objects: 2.000e+02 1.00000 2.000e+02 Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 MPI Reductions: 1.046e+03 1.00000 I am not entirely sure if I can make sense out of that statistic but if there is something more you need, please feel free to let me know. Vijay On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: > On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan > wrote: >> >> Matt, >> >> The -with-debugging=1 option is certainly not meant for performance >> studies but I didn't expect it to yield the same cpu time as a single >> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >> approximately the same amount of time for computation of solution. But >> I am currently configuring without debugging symbols and shall let you >> know what that yields. >> >> On a similar note, is there something extra that needs to be done to >> make use of multi-core machines while using MPI ? I am not sure if >> this is even related to PETSc but could be an MPI configuration option >> that maybe either I or the configure process is missing. All ideas are >> much appreciated. > > Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most > cheap multicore machines, there is a single memory bus, and thus using more > cores gains you very little extra performance. I still suspect you are not > actually > running in parallel, because you usually see a small speedup. That is why I > suggested looking at -log_summary since it tells you how many processes were > run and breaks down the time. > ?? Matt > >> >> Vijay >> >> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >> > On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >> > wrote: >> >> >> >> Hi, >> >> >> >> I am trying to configure my petsc install with an MPI installation to >> >> make use of a dual quad-core desktop system running Ubuntu. But >> >> eventhough the configure/make process went through without problems, >> >> the scalability of the programs don't seem to reflect what I expected. >> >> My configure options are >> >> >> >> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >> >> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >> >> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >> >> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >> >> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >> >> --with-debugging=1 --with-errorchecking=yes >> > >> > 1) For performance studies, make a build using --with-debugging=0 >> > 2) Look at -log_summary for a breakdown of performance >> > ?? Matt >> > >> >> >> >> Is there something else that needs to be done as part of the configure >> >> process to enable a decent scaling ? I am only comparing programs with >> >> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >> >> same time as noted from -log_summary. If it helps, I've been testing >> >> with snes/examples/tutorials/ex20.c for all purposes with a custom >> >> -grid parameter from command-line to control the number of unknowns. >> >> >> >> If there is something you've witnessed before in this configuration or >> >> if you need anything else to analyze the problem, do let me know. >> >> >> >> Thanks, >> >> Vijay >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener >> > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > From bsmith at mcs.anl.gov Wed Feb 2 18:06:29 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 2 Feb 2011 18:06:29 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: We need all the information from -log_summary to see what is going on. Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process. Barry On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: > Here's the performance statistic on 1 and 2 processor runs. > > /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary > > Max Max/Min Avg Total > Time (sec): 8.452e+00 1.00000 8.452e+00 > Objects: 1.470e+02 1.00000 1.470e+02 > Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 > Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 4.440e+02 1.00000 > > /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary > > Max Max/Min Avg Total > Time (sec): 7.851e+00 1.00000 7.851e+00 > Objects: 2.000e+02 1.00000 2.000e+02 > Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 > Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 > MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 > MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 > MPI Reductions: 1.046e+03 1.00000 > > I am not entirely sure if I can make sense out of that statistic but > if there is something more you need, please feel free to let me know. > > Vijay > > On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: >> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >> wrote: >>> >>> Matt, >>> >>> The -with-debugging=1 option is certainly not meant for performance >>> studies but I didn't expect it to yield the same cpu time as a single >>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>> approximately the same amount of time for computation of solution. But >>> I am currently configuring without debugging symbols and shall let you >>> know what that yields. >>> >>> On a similar note, is there something extra that needs to be done to >>> make use of multi-core machines while using MPI ? I am not sure if >>> this is even related to PETSc but could be an MPI configuration option >>> that maybe either I or the configure process is missing. All ideas are >>> much appreciated. >> >> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >> cheap multicore machines, there is a single memory bus, and thus using more >> cores gains you very little extra performance. I still suspect you are not >> actually >> running in parallel, because you usually see a small speedup. That is why I >> suggested looking at -log_summary since it tells you how many processes were >> run and breaks down the time. >> Matt >> >>> >>> Vijay >>> >>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am trying to configure my petsc install with an MPI installation to >>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>> eventhough the configure/make process went through without problems, >>>>> the scalability of the programs don't seem to reflect what I expected. >>>>> My configure options are >>>>> >>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>> --with-debugging=1 --with-errorchecking=yes >>>> >>>> 1) For performance studies, make a build using --with-debugging=0 >>>> 2) Look at -log_summary for a breakdown of performance >>>> Matt >>>> >>>>> >>>>> Is there something else that needs to be done as part of the configure >>>>> process to enable a decent scaling ? I am only comparing programs with >>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>> -grid parameter from command-line to control the number of unknowns. >>>>> >>>>> If there is something you've witnessed before in this configuration or >>>>> if you need anything else to analyze the problem, do let me know. >>>>> >>>>> Thanks, >>>>> Vijay >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments >>>> is infinitely more interesting than any results to which their >>>> experiments >>>> lead. >>>> -- Norbert Wiener >>>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> From vijay.m at gmail.com Wed Feb 2 18:17:45 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 2 Feb 2011 18:17:45 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: Barry, Please find attached the patch for the minor change to control the number of elements from command line for snes/ex20.c. I know that this can be achieved with -grid_x etc from command_line but thought this just made the typing for the refinement process a little easier. I apologize if there was any confusion. Also, find attached the full log summaries for -np=1 and -np=2. Thanks. Vijay On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith wrote: > > ?We need all the information from -log_summary to see what is going on. > > ?Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process. > > ? Barry > > On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: > >> Here's the performance statistic on 1 and 2 processor runs. >> >> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary >> >> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >> >> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary >> >> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >> >> I am not entirely sure if I can make sense out of that statistic but >> if there is something more you need, please feel free to let me know. >> >> Vijay >> >> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: >>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>> wrote: >>>> >>>> Matt, >>>> >>>> The -with-debugging=1 option is certainly not meant for performance >>>> studies but I didn't expect it to yield the same cpu time as a single >>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>>> approximately the same amount of time for computation of solution. But >>>> I am currently configuring without debugging symbols and shall let you >>>> know what that yields. >>>> >>>> On a similar note, is there something extra that needs to be done to >>>> make use of multi-core machines while using MPI ? I am not sure if >>>> this is even related to PETSc but could be an MPI configuration option >>>> that maybe either I or the configure process is missing. All ideas are >>>> much appreciated. >>> >>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >>> cheap multicore machines, there is a single memory bus, and thus using more >>> cores gains you very little extra performance. I still suspect you are not >>> actually >>> running in parallel, because you usually see a small speedup. That is why I >>> suggested looking at -log_summary since it tells you how many processes were >>> run and breaks down the time. >>> ? ?Matt >>> >>>> >>>> Vijay >>>> >>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to configure my petsc install with an MPI installation to >>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>> eventhough the configure/make process went through without problems, >>>>>> the scalability of the programs don't seem to reflect what I expected. >>>>>> My configure options are >>>>>> >>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>> --with-debugging=1 --with-errorchecking=yes >>>>> >>>>> 1) For performance studies, make a build using --with-debugging=0 >>>>> 2) Look at -log_summary for a breakdown of performance >>>>> ? ?Matt >>>>> >>>>>> >>>>>> Is there something else that needs to be done as part of the configure >>>>>> process to enable a decent scaling ? I am only comparing programs with >>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>>> -grid parameter from command-line to control the number of unknowns. >>>>>> >>>>>> If there is something you've witnessed before in this configuration or >>>>>> if you need anything else to analyze the problem, do let me know. >>>>>> >>>>>> Thanks, >>>>>> Vijay >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments >>>>> is infinitely more interesting than any results to which their >>>>> experiments >>>>> lead. >>>>> -- Norbert Wiener >>>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments >>> is infinitely more interesting than any results to which their experiments >>> lead. >>> -- Norbert Wiener >>> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ex20.patch Type: text/x-patch Size: 526 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex20_np1.out Type: application/octet-stream Size: 11823 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex20_np2.out Type: application/octet-stream Size: 12814 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Feb 2 18:35:09 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 2 Feb 2011 18:35:09 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case) MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 which is 74 percent of the total solve time (and 84 percent of the flops). When 3/4th of the entire run is not parallel at all you cannot expect much speedup. If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5. Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time. You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed. Barry On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: > Barry, > > Please find attached the patch for the minor change to control the > number of elements from command line for snes/ex20.c. I know that this > can be achieved with -grid_x etc from command_line but thought this > just made the typing for the refinement process a little easier. I > apologize if there was any confusion. > > Also, find attached the full log summaries for -np=1 and -np=2. Thanks. > > Vijay > > On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith wrote: >> >> We need all the information from -log_summary to see what is going on. >> >> Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process. >> >> Barry >> >> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >> >>> Here's the performance statistic on 1 and 2 processor runs. >>> >>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary >>> >>> Max Max/Min Avg Total >>> Time (sec): 8.452e+00 1.00000 8.452e+00 >>> Objects: 1.470e+02 1.00000 1.470e+02 >>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 >>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 >>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Reductions: 4.440e+02 1.00000 >>> >>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary >>> >>> Max Max/Min Avg Total >>> Time (sec): 7.851e+00 1.00000 7.851e+00 >>> Objects: 2.000e+02 1.00000 2.000e+02 >>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 >>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 >>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 >>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 >>> MPI Reductions: 1.046e+03 1.00000 >>> >>> I am not entirely sure if I can make sense out of that statistic but >>> if there is something more you need, please feel free to let me know. >>> >>> Vijay >>> >>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: >>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>> wrote: >>>>> >>>>> Matt, >>>>> >>>>> The -with-debugging=1 option is certainly not meant for performance >>>>> studies but I didn't expect it to yield the same cpu time as a single >>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>>>> approximately the same amount of time for computation of solution. But >>>>> I am currently configuring without debugging symbols and shall let you >>>>> know what that yields. >>>>> >>>>> On a similar note, is there something extra that needs to be done to >>>>> make use of multi-core machines while using MPI ? I am not sure if >>>>> this is even related to PETSc but could be an MPI configuration option >>>>> that maybe either I or the configure process is missing. All ideas are >>>>> much appreciated. >>>> >>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >>>> cheap multicore machines, there is a single memory bus, and thus using more >>>> cores gains you very little extra performance. I still suspect you are not >>>> actually >>>> running in parallel, because you usually see a small speedup. That is why I >>>> suggested looking at -log_summary since it tells you how many processes were >>>> run and breaks down the time. >>>> Matt >>>> >>>>> >>>>> Vijay >>>>> >>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to configure my petsc install with an MPI installation to >>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>> eventhough the configure/make process went through without problems, >>>>>>> the scalability of the programs don't seem to reflect what I expected. >>>>>>> My configure options are >>>>>>> >>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>> >>>>>> 1) For performance studies, make a build using --with-debugging=0 >>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>> Matt >>>>>> >>>>>>> >>>>>>> Is there something else that needs to be done as part of the configure >>>>>>> process to enable a decent scaling ? I am only comparing programs with >>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>>>> -grid parameter from command-line to control the number of unknowns. >>>>>>> >>>>>>> If there is something you've witnessed before in this configuration or >>>>>>> if you need anything else to analyze the problem, do let me know. >>>>>>> >>>>>>> Thanks, >>>>>>> Vijay >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments >>>>>> is infinitely more interesting than any results to which their >>>>>> experiments >>>>>> lead. >>>>>> -- Norbert Wiener >>>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments >>>> is infinitely more interesting than any results to which their experiments >>>> lead. >>>> -- Norbert Wiener >>>> >> >> > From balay at mcs.anl.gov Wed Feb 2 18:53:50 2011 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 2 Feb 2011 18:53:50 -0600 (CST) Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: Message-ID: On Wed, 2 Feb 2011, Vijay S. Mahadevan wrote: > On a similar note, is there something extra that needs to be done to > make use of multi-core machines while using MPI ? I am not sure if > this is even related to PETSc but could be an MPI configuration option > that maybe either I or the configure process is missing. All ideas are > much appreciated. You can try '--download-mpich --download-mpich-device=ch3:nemesis' or '--download-openmpi' both with --with-debugging=0 - and see if they make any difference [you can have a different PETSC_ARCH for each build - and then compare] Satish From vijay.m at gmail.com Wed Feb 2 23:13:50 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Wed, 2 Feb 2011 23:13:50 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> Message-ID: Barry, I understand what you are saying but which example/options then is the best one to compute the scalability in a multi-core machine ? I chose the nonlinear diffusion problem specifically because of its inherent stiffness that could lead probably provide noticeable scalability in a multi-core system. From your experience, do you think there is another example program that will demonstrate this much more rigorously or clearly ? Btw, I dont get good speedup even for 2 processes with ex20.c and that was the original motivation for this thread. Satish. I configured with --download-mpich now without the mpich-device. The results are given above. I will try with the options you provided although I dont entirely understand what they mean, which kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu ? Vijay On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: > > ? Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case) > > MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307 > MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0 > > which is 74 percent of the total solve time (and 84 percent of the flops). ? When 3/4th of the entire run is not parallel at all you cannot expect much speedup. ?If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5. > > ?Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time. > > ?You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed. > > ?Barry > > > > On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: > >> Barry, >> >> Please find attached the patch for the minor change to control the >> number of elements from command line for snes/ex20.c. I know that this >> can be achieved with -grid_x etc from command_line but thought this >> just made the typing for the refinement process a little easier. I >> apologize if there was any confusion. >> >> Also, find attached the full log summaries for -np=1 and -np=2. Thanks. >> >> Vijay >> >> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith wrote: >>> >>> ?We need all the information from -log_summary to see what is going on. >>> >>> ?Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process. >>> >>> ? Barry >>> >>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>> >>>> Here's the performance statistic on 1 and 2 processor runs. >>>> >>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >>>> >>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >>>> >>>> I am not entirely sure if I can make sense out of that statistic but >>>> if there is something more you need, please feel free to let me know. >>>> >>>> Vijay >>>> >>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: >>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>> wrote: >>>>>> >>>>>> Matt, >>>>>> >>>>>> The -with-debugging=1 option is certainly not meant for performance >>>>>> studies but I didn't expect it to yield the same cpu time as a single >>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>>>>> approximately the same amount of time for computation of solution. But >>>>>> I am currently configuring without debugging symbols and shall let you >>>>>> know what that yields. >>>>>> >>>>>> On a similar note, is there something extra that needs to be done to >>>>>> make use of multi-core machines while using MPI ? I am not sure if >>>>>> this is even related to PETSc but could be an MPI configuration option >>>>>> that maybe either I or the configure process is missing. All ideas are >>>>>> much appreciated. >>>>> >>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >>>>> cheap multicore machines, there is a single memory bus, and thus using more >>>>> cores gains you very little extra performance. I still suspect you are not >>>>> actually >>>>> running in parallel, because you usually see a small speedup. That is why I >>>>> suggested looking at -log_summary since it tells you how many processes were >>>>> run and breaks down the time. >>>>> ? ?Matt >>>>> >>>>>> >>>>>> Vijay >>>>>> >>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am trying to configure my petsc install with an MPI installation to >>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>> eventhough the configure/make process went through without problems, >>>>>>>> the scalability of the programs don't seem to reflect what I expected. >>>>>>>> My configure options are >>>>>>>> >>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>> >>>>>>> 1) For performance studies, make a build using --with-debugging=0 >>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>> ? ?Matt >>>>>>> >>>>>>>> >>>>>>>> Is there something else that needs to be done as part of the configure >>>>>>>> process to enable a decent scaling ? I am only comparing programs with >>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>>>>> -grid parameter from command-line to control the number of unknowns. >>>>>>>> >>>>>>>> If there is something you've witnessed before in this configuration or >>>>>>>> if you need anything else to analyze the problem, do let me know. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vijay >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments >>>>>>> is infinitely more interesting than any results to which their >>>>>>> experiments >>>>>>> lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments >>>>> is infinitely more interesting than any results to which their experiments >>>>> lead. >>>>> -- Norbert Wiener >>>>> >>> >>> >> > > From knepley at gmail.com Wed Feb 2 23:18:46 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 2 Feb 2011 23:18:46 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> Message-ID: On Wed, Feb 2, 2011 at 11:13 PM, Vijay S. Mahadevan wrote: > Barry, > > I understand what you are saying but which example/options then is the > best one to compute the scalability in a multi-core machine ? I chose > the nonlinear diffusion problem specifically because of its inherent > stiffness that could lead probably provide noticeable scalability in a > multi-core system. From your experience, do you think there is another > example program that will demonstrate this much more rigorously or > clearly ? Btw, I dont get good speedup even for 2 processes with > ex20.c and that was the original motivation for this thread. > Very simply, Barry said your coarse grid is way too big. Make it smaller and you will see speedup. Matt > Satish. I configured with --download-mpich now without the > mpich-device. The results are given above. I will try with the options > you provided although I dont entirely understand what they mean, which > kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu > ? > > Vijay > > On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: > > > > Ok, everything makes sense. Looks like you are using two level > multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant > -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid > problem redundantly on each process (each process is solving the entire > coarse grid solve using LU factorization). The time for the factorization is > (in the two process case) > > > > MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 > > MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 > > > > which is 74 percent of the total solve time (and 84 percent of the > flops). When 3/4th of the entire run is not parallel at all you cannot > expect much speedup. If you run with -snes_view it will display exactly the > solver being used. You cannot expect to understand the performance if you > don't understand what the solver is actually doing. Using a 20 by 20 by 20 > coarse grid is generally a bad idea since the code spends most of the time > there, stick with something like 5 by 5 by 5. > > > > Suggest running with the default grid and -dmmg_nlevels 5 now the > percent in the coarse solve will be a trivial percent of the run time. > > > > You should get pretty good speed up for 2 processes but not much better > speedup for four processes because as Matt noted the computation is memory > bandwidth limited; > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. > Note also that this is running multigrid which is a fast solver, but doesn't > parallel scale as well many slow algorithms. For example if you run > -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 > processors but crummy speed. > > > > Barry > > > > > > > > On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: > > > >> Barry, > >> > >> Please find attached the patch for the minor change to control the > >> number of elements from command line for snes/ex20.c. I know that this > >> can be achieved with -grid_x etc from command_line but thought this > >> just made the typing for the refinement process a little easier. I > >> apologize if there was any confusion. > >> > >> Also, find attached the full log summaries for -np=1 and -np=2. Thanks. > >> > >> Vijay > >> > >> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith wrote: > >>> > >>> We need all the information from -log_summary to see what is going on. > >>> > >>> Not sure what -grid 20 means but don't expect any good parallel > performance with less than at least 10,000 unknowns per process. > >>> > >>> Barry > >>> > >>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: > >>> > >>>> Here's the performance statistic on 1 and 2 processor runs. > >>>> > >>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 > -log_summary > >>>> > >>>> Max Max/Min Avg Total > >>>> Time (sec): 8.452e+00 1.00000 8.452e+00 > >>>> Objects: 1.470e+02 1.00000 1.470e+02 > >>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 > >>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 > >>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > >>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > >>>> MPI Reductions: 4.440e+02 1.00000 > >>>> > >>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 > -log_summary > >>>> > >>>> Max Max/Min Avg Total > >>>> Time (sec): 7.851e+00 1.00000 7.851e+00 > >>>> Objects: 2.000e+02 1.00000 2.000e+02 > >>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 > >>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 > >>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 > >>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 > >>>> MPI Reductions: 1.046e+03 1.00000 > >>>> > >>>> I am not entirely sure if I can make sense out of that statistic but > >>>> if there is something more you need, please feel free to let me know. > >>>> > >>>> Vijay > >>>> > >>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley > wrote: > >>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan < > vijay.m at gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> Matt, > >>>>>> > >>>>>> The -with-debugging=1 option is certainly not meant for performance > >>>>>> studies but I didn't expect it to yield the same cpu time as a > single > >>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take > >>>>>> approximately the same amount of time for computation of solution. > But > >>>>>> I am currently configuring without debugging symbols and shall let > you > >>>>>> know what that yields. > >>>>>> > >>>>>> On a similar note, is there something extra that needs to be done to > >>>>>> make use of multi-core machines while using MPI ? I am not sure if > >>>>>> this is even related to PETSc but could be an MPI configuration > option > >>>>>> that maybe either I or the configure process is missing. All ideas > are > >>>>>> much appreciated. > >>>>> > >>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On > most > >>>>> cheap multicore machines, there is a single memory bus, and thus > using more > >>>>> cores gains you very little extra performance. I still suspect you > are not > >>>>> actually > >>>>> running in parallel, because you usually see a small speedup. That is > why I > >>>>> suggested looking at -log_summary since it tells you how many > processes were > >>>>> run and breaks down the time. > >>>>> Matt > >>>>> > >>>>>> > >>>>>> Vijay > >>>>>> > >>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley > wrote: > >>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan < > vijay.m at gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I am trying to configure my petsc install with an MPI installation > to > >>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But > >>>>>>>> eventhough the configure/make process went through without > problems, > >>>>>>>> the scalability of the programs don't seem to reflect what I > expected. > >>>>>>>> My configure options are > >>>>>>>> > >>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ > --download-mpich=1 > >>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g > >>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 > >>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ > >>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes > >>>>>>>> --with-debugging=1 --with-errorchecking=yes > >>>>>>> > >>>>>>> 1) For performance studies, make a build using --with-debugging=0 > >>>>>>> 2) Look at -log_summary for a breakdown of performance > >>>>>>> Matt > >>>>>>> > >>>>>>>> > >>>>>>>> Is there something else that needs to be done as part of the > configure > >>>>>>>> process to enable a decent scaling ? I am only comparing programs > with > >>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately > the > >>>>>>>> same time as noted from -log_summary. If it helps, I've been > testing > >>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom > >>>>>>>> -grid parameter from command-line to control the number of > unknowns. > >>>>>>>> > >>>>>>>> If there is something you've witnessed before in this > configuration or > >>>>>>>> if you need anything else to analyze the problem, do let me know. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Vijay > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> What most experimenters take for granted before they begin their > >>>>>>> experiments > >>>>>>> is infinitely more interesting than any results to which their > >>>>>>> experiments > >>>>>>> lead. > >>>>>>> -- Norbert Wiener > >>>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their > experiments > >>>>> is infinitely more interesting than any results to which their > experiments > >>>>> lead. > >>>>> -- Norbert Wiener > >>>>> > >>> > >>> > >> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vkuhlem at emory.edu Thu Feb 3 10:35:39 2011 From: vkuhlem at emory.edu (Verena Kuhlemann) Date: Thu, 3 Feb 2011 11:35:39 -0500 Subject: [petsc-users] LU vs. ILU Message-ID: Hello, I am somehow confused with the usage of PCLU and PCILU ind PETSc. It seems as there are the same options to choose from for both. In particular, if I use PCLU and PCFactorSetFill(pc,5) don't I end up with an incomplete LU factorization? If I use PCLU with no other options set, will the factorization be complete? Thanks, Verena -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 3 10:45:43 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 3 Feb 2011 10:45:43 -0600 Subject: [petsc-users] LU vs. ILU In-Reply-To: References: Message-ID: On Thu, Feb 3, 2011 at 10:35 AM, Verena Kuhlemann wrote: > Hello, > > I am somehow confused with the usage of PCLU and PCILU ind PETSc. > It seems as there are the same options to choose from for both. > In particular, if I use PCLU and PCFactorSetFill(pc,5) > don't I end up with an incomplete LU factorization? > If I use PCLU with no other options set, will the > factorization be complete? > LU ignores the fill option. It always gives the complete factorization. Matt > Thanks, > Verena > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Feb 3 10:58:32 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Thu, 3 Feb 2011 10:58:32 -0600 Subject: [petsc-users] LU vs. ILU In-Reply-To: References: Message-ID: Verena: > I am somehow confused with the usage of PCLU and PCILU ind PETSc. > It seems as there are the same options to choose from for both. > In particular, if I use PCLU and?PCFactorSetFill(pc,5) > don't I end up with an incomplete LU factorization? No. When user-provided fill is not sufficient, petsc will increases it to whatever LU requires and run LU factorization. In this case, few or more malloc() will be called, which could be expensive. > If I use PCLU with no other options set, will the > factorization be complete? Yes, be complete. But providing a good estimates for options will make computation efficient. Hong From knepley at gmail.com Thu Feb 3 11:07:09 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 3 Feb 2011 11:07:09 -0600 Subject: [petsc-users] LU vs. ILU In-Reply-To: References: Message-ID: On Thu, Feb 3, 2011 at 10:45 AM, Matthew Knepley wrote: > On Thu, Feb 3, 2011 at 10:35 AM, Verena Kuhlemann wrote: > >> Hello, >> >> I am somehow confused with the usage of PCLU and PCILU ind PETSc. >> It seems as there are the same options to choose from for both. >> In particular, if I use PCLU and PCFactorSetFill(pc,5) >> don't I end up with an incomplete LU factorization? >> If I use PCLU with no other options set, will the >> factorization be complete? >> > > LU ignores the fill option. It always gives the complete factorization. > Hong is right. I did not mean ignore, but rather will always keep allocating. Matt > Matt > > >> Thanks, >> Verena >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 3 11:10:07 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 11:10:07 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> Message-ID: <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: > Barry, > > I understand what you are saying but which example/options then is the > best one to compute the scalability in a multi-core machine ? I chose > the nonlinear diffusion problem specifically because of its inherent > stiffness that could lead probably provide noticeable scalability in a > multi-core system. From your experience, do you think there is another > example program that will demonstrate this much more rigorously or > clearly ? Btw, I dont get good speedup even for 2 processes with > ex20.c and that was the original motivation for this thread. Did you follow my instructions? Barry > > Satish. I configured with --download-mpich now without the > mpich-device. The results are given above. I will try with the options > you provided although I dont entirely understand what they mean, which > kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu > ? > > Vijay > > On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >> >> Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case) >> >> MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 >> MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 >> >> which is 74 percent of the total solve time (and 84 percent of the flops). When 3/4th of the entire run is not parallel at all you cannot expect much speedup. If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5. >> >> Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time. >> >> You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed. >> >> Barry >> >> >> >> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >> >>> Barry, >>> >>> Please find attached the patch for the minor change to control the >>> number of elements from command line for snes/ex20.c. I know that this >>> can be achieved with -grid_x etc from command_line but thought this >>> just made the typing for the refinement process a little easier. I >>> apologize if there was any confusion. >>> >>> Also, find attached the full log summaries for -np=1 and -np=2. Thanks. >>> >>> Vijay >>> >>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith wrote: >>>> >>>> We need all the information from -log_summary to see what is going on. >>>> >>>> Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process. >>>> >>>> Barry >>>> >>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>> >>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>> >>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary >>>>> >>>>> Max Max/Min Avg Total >>>>> Time (sec): 8.452e+00 1.00000 8.452e+00 >>>>> Objects: 1.470e+02 1.00000 1.470e+02 >>>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 >>>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 >>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>> MPI Reductions: 4.440e+02 1.00000 >>>>> >>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary >>>>> >>>>> Max Max/Min Avg Total >>>>> Time (sec): 7.851e+00 1.00000 7.851e+00 >>>>> Objects: 2.000e+02 1.00000 2.000e+02 >>>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 >>>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 >>>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 >>>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 >>>>> MPI Reductions: 1.046e+03 1.00000 >>>>> >>>>> I am not entirely sure if I can make sense out of that statistic but >>>>> if there is something more you need, please feel free to let me know. >>>>> >>>>> Vijay >>>>> >>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: >>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>> wrote: >>>>>>> >>>>>>> Matt, >>>>>>> >>>>>>> The -with-debugging=1 option is certainly not meant for performance >>>>>>> studies but I didn't expect it to yield the same cpu time as a single >>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>>>>>> approximately the same amount of time for computation of solution. But >>>>>>> I am currently configuring without debugging symbols and shall let you >>>>>>> know what that yields. >>>>>>> >>>>>>> On a similar note, is there something extra that needs to be done to >>>>>>> make use of multi-core machines while using MPI ? I am not sure if >>>>>>> this is even related to PETSc but could be an MPI configuration option >>>>>>> that maybe either I or the configure process is missing. All ideas are >>>>>>> much appreciated. >>>>>> >>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >>>>>> cheap multicore machines, there is a single memory bus, and thus using more >>>>>> cores gains you very little extra performance. I still suspect you are not >>>>>> actually >>>>>> running in parallel, because you usually see a small speedup. That is why I >>>>>> suggested looking at -log_summary since it tells you how many processes were >>>>>> run and breaks down the time. >>>>>> Matt >>>>>> >>>>>>> >>>>>>> Vijay >>>>>>> >>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am trying to configure my petsc install with an MPI installation to >>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>> eventhough the configure/make process went through without problems, >>>>>>>>> the scalability of the programs don't seem to reflect what I expected. >>>>>>>>> My configure options are >>>>>>>>> >>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>> >>>>>>>> 1) For performance studies, make a build using --with-debugging=0 >>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>> Matt >>>>>>>> >>>>>>>>> >>>>>>>>> Is there something else that needs to be done as part of the configure >>>>>>>>> process to enable a decent scaling ? I am only comparing programs with >>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>>>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>>>>>> -grid parameter from command-line to control the number of unknowns. >>>>>>>>> >>>>>>>>> If there is something you've witnessed before in this configuration or >>>>>>>>> if you need anything else to analyze the problem, do let me know. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vijay >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments >>>>>>>> is infinitely more interesting than any results to which their >>>>>>>> experiments >>>>>>>> lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments >>>>>> is infinitely more interesting than any results to which their experiments >>>>>> lead. >>>>>> -- Norbert Wiener >>>>>> >>>> >>>> >>> >> >> From vijay.m at gmail.com Thu Feb 3 11:37:33 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 3 Feb 2011 11:37:33 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Barry, Sorry about the delay in the reply. I did not have access to the system to test out what you said, until now. I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 processor time 1 114.2 2 89.45 4 81.01 The scaleup doesn't seem to be optimal, even with two processors. I am wondering if the fault is in the MPI configuration itself. Are these results as you would expect ? I can also send you the log_summary for all cases if that will help. Vijay On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: > > On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: > >> Barry, >> >> I understand what you are saying but which example/options then is the >> best one to compute the scalability in a multi-core machine ? I chose >> the nonlinear diffusion problem specifically because of its inherent >> stiffness that could lead probably provide noticeable scalability in a >> multi-core system. From your experience, do you think there is another >> example program that will demonstrate this much more rigorously or >> clearly ? Btw, I dont get good speedup even for 2 processes with >> ex20.c and that was the original motivation for this thread. > > ? Did you follow my instructions? > > ? Barry > >> >> Satish. I configured with --download-mpich now without the >> mpich-device. The results are given above. I will try with the options >> you provided although I dont entirely understand what they mean, which >> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >> ? >> >> Vijay >> >> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>> >>> ? Ok, everything makes sense. Looks like you are using two level multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid problem redundantly on each process (each process is solving the entire coarse grid solve using LU factorization). The time for the factorization is (in the two process case) >>> >>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307 >>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0 >>> >>> which is 74 percent of the total solve time (and 84 percent of the flops). ? When 3/4th of the entire run is not parallel at all you cannot expect much speedup. ?If you run with -snes_view it will display exactly the solver being used. You cannot expect to understand the performance if you don't understand what the solver is actually doing. Using a 20 by 20 by 20 coarse grid is generally a bad idea since the code spends most of the time there, stick with something like 5 by 5 by 5. >>> >>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the percent in the coarse solve will be a trivial percent of the run time. >>> >>> ?You should get pretty good speed up for 2 processes but not much better speedup for four processes because as Matt noted the computation is memory bandwidth limited; http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note also that this is running multigrid which is a fast solver, but doesn't parallel scale as well many slow algorithms. For example if you run -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 processors but crummy speed. >>> >>> ?Barry >>> >>> >>> >>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>> >>>> Barry, >>>> >>>> Please find attached the patch for the minor change to control the >>>> number of elements from command line for snes/ex20.c. I know that this >>>> can be achieved with -grid_x etc from command_line but thought this >>>> just made the typing for the refinement process a little easier. I >>>> apologize if there was any confusion. >>>> >>>> Also, find attached the full log summaries for -np=1 and -np=2. Thanks. >>>> >>>> Vijay >>>> >>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith wrote: >>>>> >>>>> ?We need all the information from -log_summary to see what is going on. >>>>> >>>>> ?Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process. >>>>> >>>>> ? Barry >>>>> >>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>> >>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary >>>>>> >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >>>>>> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary >>>>>> >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >>>>>> >>>>>> I am not entirely sure if I can make sense out of that statistic but >>>>>> if there is something more you need, please feel free to let me know. >>>>>> >>>>>> Vijay >>>>>> >>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley wrote: >>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>> wrote: >>>>>>>> >>>>>>>> Matt, >>>>>>>> >>>>>>>> The -with-debugging=1 option is certainly not meant for performance >>>>>>>> studies but I didn't expect it to yield the same cpu time as a single >>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take >>>>>>>> approximately the same amount of time for computation of solution. But >>>>>>>> I am currently configuring without debugging symbols and shall let you >>>>>>>> know what that yields. >>>>>>>> >>>>>>>> On a similar note, is there something extra that needs to be done to >>>>>>>> make use of multi-core machines while using MPI ? I am not sure if >>>>>>>> this is even related to PETSc but could be an MPI configuration option >>>>>>>> that maybe either I or the configure process is missing. All ideas are >>>>>>>> much appreciated. >>>>>>> >>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most >>>>>>> cheap multicore machines, there is a single memory bus, and thus using more >>>>>>> cores gains you very little extra performance. I still suspect you are not >>>>>>> actually >>>>>>> running in parallel, because you usually see a small speedup. That is why I >>>>>>> suggested looking at -log_summary since it tells you how many processes were >>>>>>> run and breaks down the time. >>>>>>> ? ?Matt >>>>>>> >>>>>>>> >>>>>>>> Vijay >>>>>>>> >>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley wrote: >>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am trying to configure my petsc install with an MPI installation to >>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>> eventhough the configure/make process went through without problems, >>>>>>>>>> the scalability of the programs don't seem to reflect what I expected. >>>>>>>>>> My configure options are >>>>>>>>>> >>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1 >>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 >>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>> >>>>>>>>> 1) For performance studies, make a build using --with-debugging=0 >>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>> ? ?Matt >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is there something else that needs to be done as part of the configure >>>>>>>>>> process to enable a decent scaling ? I am only comparing programs with >>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the >>>>>>>>>> same time as noted from -log_summary. If it helps, I've been testing >>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom >>>>>>>>>> -grid parameter from command-line to control the number of unknowns. >>>>>>>>>> >>>>>>>>>> If there is something you've witnessed before in this configuration or >>>>>>>>>> if you need anything else to analyze the problem, do let me know. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vijay >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments >>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>> experiments >>>>>>>>> lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments >>>>>>> is infinitely more interesting than any results to which their experiments >>>>>>> lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>> >>>>> >>>> >>> >>> > > From knepley at gmail.com Thu Feb 3 11:42:57 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 3 Feb 2011 11:42:57 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan wrote: > Barry, > > Sorry about the delay in the reply. I did not have access to the > system to test out what you said, until now. > > I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 > -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 > > processor time > 1 114.2 > 2 89.45 > 4 81.01 > 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from this data. 2) Do you know the memory bandwidth characteristics of this machine? That is crucial and you cannot begin to understand speedup on it until you do. Please look this up. 3) Worrying about specifics of the MPI implementation makes no sense until the basics are nailed down. Matt > The scaleup doesn't seem to be optimal, even with two processors. I am > wondering if the fault is in the MPI configuration itself. Are these > results as you would expect ? I can also send you the log_summary for > all cases if that will help. > > Vijay > > On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: > > > > On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: > > > >> Barry, > >> > >> I understand what you are saying but which example/options then is the > >> best one to compute the scalability in a multi-core machine ? I chose > >> the nonlinear diffusion problem specifically because of its inherent > >> stiffness that could lead probably provide noticeable scalability in a > >> multi-core system. From your experience, do you think there is another > >> example program that will demonstrate this much more rigorously or > >> clearly ? Btw, I dont get good speedup even for 2 processes with > >> ex20.c and that was the original motivation for this thread. > > > > Did you follow my instructions? > > > > Barry > > > >> > >> Satish. I configured with --download-mpich now without the > >> mpich-device. The results are given above. I will try with the options > >> you provided although I dont entirely understand what they mean, which > >> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu > >> ? > >> > >> Vijay > >> > >> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: > >>> > >>> Ok, everything makes sense. Looks like you are using two level > multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant > -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid > problem redundantly on each process (each process is solving the entire > coarse grid solve using LU factorization). The time for the factorization is > (in the two process case) > >>> > >>> MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 > >>> MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 > >>> > >>> which is 74 percent of the total solve time (and 84 percent of the > flops). When 3/4th of the entire run is not parallel at all you cannot > expect much speedup. If you run with -snes_view it will display exactly the > solver being used. You cannot expect to understand the performance if you > don't understand what the solver is actually doing. Using a 20 by 20 by 20 > coarse grid is generally a bad idea since the code spends most of the time > there, stick with something like 5 by 5 by 5. > >>> > >>> Suggest running with the default grid and -dmmg_nlevels 5 now the > percent in the coarse solve will be a trivial percent of the run time. > >>> > >>> You should get pretty good speed up for 2 processes but not much > better speedup for four processes because as Matt noted the computation is > memory bandwidth limited; > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. > Note also that this is running multigrid which is a fast solver, but doesn't > parallel scale as well many slow algorithms. For example if you run > -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 > processors but crummy speed. > >>> > >>> Barry > >>> > >>> > >>> > >>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: > >>> > >>>> Barry, > >>>> > >>>> Please find attached the patch for the minor change to control the > >>>> number of elements from command line for snes/ex20.c. I know that this > >>>> can be achieved with -grid_x etc from command_line but thought this > >>>> just made the typing for the refinement process a little easier. I > >>>> apologize if there was any confusion. > >>>> > >>>> Also, find attached the full log summaries for -np=1 and -np=2. > Thanks. > >>>> > >>>> Vijay > >>>> > >>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith > wrote: > >>>>> > >>>>> We need all the information from -log_summary to see what is going > on. > >>>>> > >>>>> Not sure what -grid 20 means but don't expect any good parallel > performance with less than at least 10,000 unknowns per process. > >>>>> > >>>>> Barry > >>>>> > >>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: > >>>>> > >>>>>> Here's the performance statistic on 1 and 2 processor runs. > >>>>>> > >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 > -log_summary > >>>>>> > >>>>>> Max Max/Min Avg Total > >>>>>> Time (sec): 8.452e+00 1.00000 8.452e+00 > >>>>>> Objects: 1.470e+02 1.00000 1.470e+02 > >>>>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 > >>>>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 > >>>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > >>>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > >>>>>> MPI Reductions: 4.440e+02 1.00000 > >>>>>> > >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 > -log_summary > >>>>>> > >>>>>> Max Max/Min Avg Total > >>>>>> Time (sec): 7.851e+00 1.00000 7.851e+00 > >>>>>> Objects: 2.000e+02 1.00000 2.000e+02 > >>>>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 > >>>>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 > >>>>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 > >>>>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 > >>>>>> MPI Reductions: 1.046e+03 1.00000 > >>>>>> > >>>>>> I am not entirely sure if I can make sense out of that statistic but > >>>>>> if there is something more you need, please feel free to let me > know. > >>>>>> > >>>>>> Vijay > >>>>>> > >>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley > wrote: > >>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan < > vijay.m at gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Matt, > >>>>>>>> > >>>>>>>> The -with-debugging=1 option is certainly not meant for > performance > >>>>>>>> studies but I didn't expect it to yield the same cpu time as a > single > >>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take > >>>>>>>> approximately the same amount of time for computation of solution. > But > >>>>>>>> I am currently configuring without debugging symbols and shall let > you > >>>>>>>> know what that yields. > >>>>>>>> > >>>>>>>> On a similar note, is there something extra that needs to be done > to > >>>>>>>> make use of multi-core machines while using MPI ? I am not sure if > >>>>>>>> this is even related to PETSc but could be an MPI configuration > option > >>>>>>>> that maybe either I or the configure process is missing. All ideas > are > >>>>>>>> much appreciated. > >>>>>>> > >>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On > most > >>>>>>> cheap multicore machines, there is a single memory bus, and thus > using more > >>>>>>> cores gains you very little extra performance. I still suspect you > are not > >>>>>>> actually > >>>>>>> running in parallel, because you usually see a small speedup. That > is why I > >>>>>>> suggested looking at -log_summary since it tells you how many > processes were > >>>>>>> run and breaks down the time. > >>>>>>> Matt > >>>>>>> > >>>>>>>> > >>>>>>>> Vijay > >>>>>>>> > >>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley < > knepley at gmail.com> wrote: > >>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan < > vijay.m at gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I am trying to configure my petsc install with an MPI > installation to > >>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But > >>>>>>>>>> eventhough the configure/make process went through without > problems, > >>>>>>>>>> the scalability of the programs don't seem to reflect what I > expected. > >>>>>>>>>> My configure options are > >>>>>>>>>> > >>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ > --download-mpich=1 > >>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g > >>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 > --download-hypre=1 > >>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ > >>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes > >>>>>>>>>> --with-debugging=1 --with-errorchecking=yes > >>>>>>>>> > >>>>>>>>> 1) For performance studies, make a build using --with-debugging=0 > >>>>>>>>> 2) Look at -log_summary for a breakdown of performance > >>>>>>>>> Matt > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Is there something else that needs to be done as part of the > configure > >>>>>>>>>> process to enable a decent scaling ? I am only comparing > programs with > >>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking > approximately the > >>>>>>>>>> same time as noted from -log_summary. If it helps, I've been > testing > >>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a > custom > >>>>>>>>>> -grid parameter from command-line to control the number of > unknowns. > >>>>>>>>>> > >>>>>>>>>> If there is something you've witnessed before in this > configuration or > >>>>>>>>>> if you need anything else to analyze the problem, do let me > know. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Vijay > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> What most experimenters take for granted before they begin their > >>>>>>>>> experiments > >>>>>>>>> is infinitely more interesting than any results to which their > >>>>>>>>> experiments > >>>>>>>>> lead. > >>>>>>>>> -- Norbert Wiener > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> What most experimenters take for granted before they begin their > experiments > >>>>>>> is infinitely more interesting than any results to which their > experiments > >>>>>>> lead. > >>>>>>> -- Norbert Wiener > >>>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Thu Feb 3 12:05:15 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 3 Feb 2011 12:05:15 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Matt, I apologize for the incomplete information. Find attached the log_summary for all the cases. The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with 2x2GB/2x4GB configuration. I do not know how to decipher the memory bandwidth with this information but if you need anything more, do let me know. VIjay On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: > On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan > wrote: >> >> Barry, >> >> Sorry about the delay in the reply. I did not have access to the >> system to test out what you said, until now. >> >> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >> >> processor ? ? ? time >> 1 ? ? ? ? ? ? ? ? ? ? ?114.2 >> 2 ? ? ? ? ? ? ? ? ? ? ?89.45 >> 4 ? ? ? ? ? ? ? ? ? ? ?81.01 > > 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from > this data. > 2) Do you know the memory bandwidth characteristics of this machine? That is > crucial and > ?? ?you cannot begin to understand speedup on it until you do. Please look > this up. > 3) Worrying about specifics of the MPI implementation makes no sense until > the basics are nailed down. > ?? Matt > >> >> The scaleup doesn't seem to be optimal, even with two processors. I am >> wondering if the fault is in the MPI configuration itself. Are these >> results as you would expect ? I can also send you the log_summary for >> all cases if that will help. >> >> Vijay >> >> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >> > >> > On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >> > >> >> Barry, >> >> >> >> I understand what you are saying but which example/options then is the >> >> best one to compute the scalability in a multi-core machine ? I chose >> >> the nonlinear diffusion problem specifically because of its inherent >> >> stiffness that could lead probably provide noticeable scalability in a >> >> multi-core system. From your experience, do you think there is another >> >> example program that will demonstrate this much more rigorously or >> >> clearly ? Btw, I dont get good speedup even for 2 processes with >> >> ex20.c and that was the original motivation for this thread. >> > >> > ? Did you follow my instructions? >> > >> > ? Barry >> > >> >> >> >> Satish. I configured with --download-mpich now without the >> >> mpich-device. The results are given above. I will try with the options >> >> you provided although I dont entirely understand what they mean, which >> >> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >> >> ? >> >> >> >> Vijay >> >> >> >> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >> >>> >> >>> ? Ok, everything makes sense. Looks like you are using two level >> >>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >> >>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid >> >>> problem redundantly on each process (each process is solving the entire >> >>> coarse grid solve using LU factorization). The time for the factorization is >> >>> (in the two process case) >> >>> >> >>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >> >>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307 >> >>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >> >>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0 >> >>> >> >>> which is 74 percent of the total solve time (and 84 percent of the >> >>> flops). ? When 3/4th of the entire run is not parallel at all you cannot >> >>> expect much speedup. ?If you run with -snes_view it will display exactly the >> >>> solver being used. You cannot expect to understand the performance if you >> >>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >> >>> coarse grid is generally a bad idea since the code spends most of the time >> >>> there, stick with something like 5 by 5 by 5. >> >>> >> >>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the >> >>> percent in the coarse solve will be a trivial percent of the run time. >> >>> >> >>> ?You should get pretty good speed up for 2 processes but not much >> >>> better speedup for four processes because as Matt noted the computation is >> >>> memory bandwidth limited; >> >>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >> >>> also that this is running multigrid which is a fast solver, but doesn't >> >>> parallel scale as well many slow algorithms. For example if you run >> >>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >> >>> processors but crummy speed. >> >>> >> >>> ?Barry >> >>> >> >>> >> >>> >> >>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >> >>> >> >>>> Barry, >> >>>> >> >>>> Please find attached the patch for the minor change to control the >> >>>> number of elements from command line for snes/ex20.c. I know that >> >>>> this >> >>>> can be achieved with -grid_x etc from command_line but thought this >> >>>> just made the typing for the refinement process a little easier. I >> >>>> apologize if there was any confusion. >> >>>> >> >>>> Also, find attached the full log summaries for -np=1 and -np=2. >> >>>> Thanks. >> >>>> >> >>>> Vijay >> >>>> >> >>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >> >>>> wrote: >> >>>>> >> >>>>> ?We need all the information from -log_summary to see what is going >> >>>>> on. >> >>>>> >> >>>>> ?Not sure what -grid 20 means but don't expect any good parallel >> >>>>> performance with less than at least 10,000 unknowns per process. >> >>>>> >> >>>>> ? Barry >> >>>>> >> >>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >> >>>>> >> >>>>>> Here's the performance statistic on 1 and 2 processor runs. >> >>>>>> >> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >> >>>>>> -log_summary >> >>>>>> >> >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >> >>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >> >>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >> >>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >> >>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >> >>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >> >>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >> >>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >> >>>>>> >> >>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >> >>>>>> -log_summary >> >>>>>> >> >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >> >>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >> >>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >> >>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >> >>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >> >>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >> >>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >> >>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >> >>>>>> >> >>>>>> I am not entirely sure if I can make sense out of that statistic >> >>>>>> but >> >>>>>> if there is something more you need, please feel free to let me >> >>>>>> know. >> >>>>>> >> >>>>>> Vijay >> >>>>>> >> >>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >> >>>>>> wrote: >> >>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >> >>>>>>> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> Matt, >> >>>>>>>> >> >>>>>>>> The -with-debugging=1 option is certainly not meant for >> >>>>>>>> performance >> >>>>>>>> studies but I didn't expect it to yield the same cpu time as a >> >>>>>>>> single >> >>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >> >>>>>>>> take >> >>>>>>>> approximately the same amount of time for computation of >> >>>>>>>> solution. But >> >>>>>>>> I am currently configuring without debugging symbols and shall >> >>>>>>>> let you >> >>>>>>>> know what that yields. >> >>>>>>>> >> >>>>>>>> On a similar note, is there something extra that needs to be done >> >>>>>>>> to >> >>>>>>>> make use of multi-core machines while using MPI ? I am not sure >> >>>>>>>> if >> >>>>>>>> this is even related to PETSc but could be an MPI configuration >> >>>>>>>> option >> >>>>>>>> that maybe either I or the configure process is missing. All >> >>>>>>>> ideas are >> >>>>>>>> much appreciated. >> >>>>>>> >> >>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >> >>>>>>> On most >> >>>>>>> cheap multicore machines, there is a single memory bus, and thus >> >>>>>>> using more >> >>>>>>> cores gains you very little extra performance. I still suspect you >> >>>>>>> are not >> >>>>>>> actually >> >>>>>>> running in parallel, because you usually see a small speedup. That >> >>>>>>> is why I >> >>>>>>> suggested looking at -log_summary since it tells you how many >> >>>>>>> processes were >> >>>>>>> run and breaks down the time. >> >>>>>>> ? ?Matt >> >>>>>>> >> >>>>>>>> >> >>>>>>>> Vijay >> >>>>>>>> >> >>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >> >>>>>>>> wrote: >> >>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >> >>>>>>>>> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> Hi, >> >>>>>>>>>> >> >>>>>>>>>> I am trying to configure my petsc install with an MPI >> >>>>>>>>>> installation to >> >>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >> >>>>>>>>>> eventhough the configure/make process went through without >> >>>>>>>>>> problems, >> >>>>>>>>>> the scalability of the programs don't seem to reflect what I >> >>>>>>>>>> expected. >> >>>>>>>>>> My configure options are >> >>>>>>>>>> >> >>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >> >>>>>>>>>> --download-mpich=1 >> >>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >> >>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >> >>>>>>>>>> --download-hypre=1 >> >>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >> >>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >> >>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >> >>>>>>>>> >> >>>>>>>>> 1) For performance studies, make a build using >> >>>>>>>>> --with-debugging=0 >> >>>>>>>>> 2) Look at -log_summary for a breakdown of performance >> >>>>>>>>> ? ?Matt >> >>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Is there something else that needs to be done as part of the >> >>>>>>>>>> configure >> >>>>>>>>>> process to enable a decent scaling ? I am only comparing >> >>>>>>>>>> programs with >> >>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >> >>>>>>>>>> approximately the >> >>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >> >>>>>>>>>> testing >> >>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >> >>>>>>>>>> custom >> >>>>>>>>>> -grid parameter from command-line to control the number of >> >>>>>>>>>> unknowns. >> >>>>>>>>>> >> >>>>>>>>>> If there is something you've witnessed before in this >> >>>>>>>>>> configuration or >> >>>>>>>>>> if you need anything else to analyze the problem, do let me >> >>>>>>>>>> know. >> >>>>>>>>>> >> >>>>>>>>>> Thanks, >> >>>>>>>>>> Vijay >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> What most experimenters take for granted before they begin their >> >>>>>>>>> experiments >> >>>>>>>>> is infinitely more interesting than any results to which their >> >>>>>>>>> experiments >> >>>>>>>>> lead. >> >>>>>>>>> -- Norbert Wiener >> >>>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> What most experimenters take for granted before they begin their >> >>>>>>> experiments >> >>>>>>> is infinitely more interesting than any results to which their >> >>>>>>> experiments >> >>>>>>> lead. >> >>>>>>> -- Norbert Wiener >> >>>>>>> >> >>>>> >> >>>>> >> >>>> >> >>> >> >>> >> > >> > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > -------------- next part -------------- A non-text attachment was scrubbed... Name: ex20_np1.out Type: application/octet-stream Size: 12365 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex20_np2.out Type: application/octet-stream Size: 13469 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex20_np4.out Type: application/octet-stream Size: 14749 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Feb 3 13:17:28 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 13:17:28 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Vljay Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ 1 process VecMAXPY 3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 29 40 0 0 0 1983 2 processes VecMAXPY 3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 31 40 0 0 0 2443 The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 which is terrible! Now why would it be so bad (remember you cannot blame MPI) 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). Barry On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: > Matt, > > I apologize for the incomplete information. Find attached the > log_summary for all the cases. > > The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with > 2x2GB/2x4GB configuration. I do not know how to decipher the memory > bandwidth with this information but if you need anything more, do let > me know. > > VIjay > > On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >> wrote: >>> >>> Barry, >>> >>> Sorry about the delay in the reply. I did not have access to the >>> system to test out what you said, until now. >>> >>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>> >>> processor time >>> 1 114.2 >>> 2 89.45 >>> 4 81.01 >> >> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >> this data. >> 2) Do you know the memory bandwidth characteristics of this machine? That is >> crucial and >> you cannot begin to understand speedup on it until you do. Please look >> this up. >> 3) Worrying about specifics of the MPI implementation makes no sense until >> the basics are nailed down. >> Matt >> >>> >>> The scaleup doesn't seem to be optimal, even with two processors. I am >>> wondering if the fault is in the MPI configuration itself. Are these >>> results as you would expect ? I can also send you the log_summary for >>> all cases if that will help. >>> >>> Vijay >>> >>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>> >>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>> >>>>> Barry, >>>>> >>>>> I understand what you are saying but which example/options then is the >>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>> the nonlinear diffusion problem specifically because of its inherent >>>>> stiffness that could lead probably provide noticeable scalability in a >>>>> multi-core system. From your experience, do you think there is another >>>>> example program that will demonstrate this much more rigorously or >>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>> ex20.c and that was the original motivation for this thread. >>>> >>>> Did you follow my instructions? >>>> >>>> Barry >>>> >>>>> >>>>> Satish. I configured with --download-mpich now without the >>>>> mpich-device. The results are given above. I will try with the options >>>>> you provided although I dont entirely understand what they mean, which >>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>> ? >>>>> >>>>> Vijay >>>>> >>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>> >>>>>> Ok, everything makes sense. Looks like you are using two level >>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>> -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid >>>>>> problem redundantly on each process (each process is solving the entire >>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>> (in the two process case) >>>>>> >>>>>> MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 >>>>>> MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 >>>>>> >>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>> flops). When 3/4th of the entire run is not parallel at all you cannot >>>>>> expect much speedup. If you run with -snes_view it will display exactly the >>>>>> solver being used. You cannot expect to understand the performance if you >>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>> there, stick with something like 5 by 5 by 5. >>>>>> >>>>>> Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>> >>>>>> You should get pretty good speed up for 2 processes but not much >>>>>> better speedup for four processes because as Matt noted the computation is >>>>>> memory bandwidth limited; >>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>> processors but crummy speed. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>> >>>>>>> Barry, >>>>>>> >>>>>>> Please find attached the patch for the minor change to control the >>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>> this >>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>> just made the typing for the refinement process a little easier. I >>>>>>> apologize if there was any confusion. >>>>>>> >>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>> Thanks. >>>>>>> >>>>>>> Vijay >>>>>>> >>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>> wrote: >>>>>>>> >>>>>>>> We need all the information from -log_summary to see what is going >>>>>>>> on. >>>>>>>> >>>>>>>> Not sure what -grid 20 means but don't expect any good parallel >>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>> >>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>> >>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>> -log_summary >>>>>>>>> >>>>>>>>> Max Max/Min Avg Total >>>>>>>>> Time (sec): 8.452e+00 1.00000 8.452e+00 >>>>>>>>> Objects: 1.470e+02 1.00000 1.470e+02 >>>>>>>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 >>>>>>>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 >>>>>>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>> MPI Reductions: 4.440e+02 1.00000 >>>>>>>>> >>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>> -log_summary >>>>>>>>> >>>>>>>>> Max Max/Min Avg Total >>>>>>>>> Time (sec): 7.851e+00 1.00000 7.851e+00 >>>>>>>>> Objects: 2.000e+02 1.00000 2.000e+02 >>>>>>>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 >>>>>>>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 >>>>>>>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 >>>>>>>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 >>>>>>>>> MPI Reductions: 1.046e+03 1.00000 >>>>>>>>> >>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>> but >>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>> know. >>>>>>>>> >>>>>>>>> Vijay >>>>>>>>> >>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>> wrote: >>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Matt, >>>>>>>>>>> >>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>> performance >>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>> single >>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>> take >>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>> solution. But >>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>> let you >>>>>>>>>>> know what that yields. >>>>>>>>>>> >>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>> to >>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>> if >>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>> option >>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>> ideas are >>>>>>>>>>> much appreciated. >>>>>>>>>> >>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>> On most >>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>> using more >>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>> are not >>>>>>>>>> actually >>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>> is why I >>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>> processes were >>>>>>>>>> run and breaks down the time. >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Vijay >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>> wrote: >>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>> installation to >>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>> problems, >>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>> expected. >>>>>>>>>>>>> My configure options are >>>>>>>>>>>>> >>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>> >>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>> configure >>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>> programs with >>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>> approximately the >>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>> testing >>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>> custom >>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>> unknowns. >>>>>>>>>>>>> >>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>> configuration or >>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>> know. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Vijay >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>> experiments >>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>> experiments >>>>>>>>>>>> lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments >>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>> experiments >>>>>>>>>> lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> > From jed at 59A2.org Thu Feb 3 13:25:26 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 3 Feb 2011 16:25:26 -0300 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: On Thu, Feb 3, 2011 at 16:17, Barry Smith wrote: > In src/benchmarks/streams you can run make test and have it generate a > report of how the streams benchmark is able to utilize the memory bandwidth. > Run that and send us the output (run with just 2 threads). That test does no software prefetch, is not vectorized (look at the assembly, you want all movapd and addpd/mulpd with memory addresses instead of addsd/mulsd or addpd/mulpd operating only on register operands), and is not NUMA-aware (which depending on the hardware, can cause performance problems). The output is still relevant and indicates what can be done without tuning, but does not accurately represent the peak achievable by the hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 3 13:30:31 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 13:30:31 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: <9F199EA2-E304-49F5-8EBF-2068125A1378@mcs.anl.gov> On Feb 3, 2011, at 1:25 PM, Jed Brown wrote: > On Thu, Feb 3, 2011 at 16:17, Barry Smith wrote: > In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). > > That test does no software prefetch, is not vectorized (look at the assembly, you want all movapd and addpd/mulpd with memory addresses instead of addsd/mulsd or addpd/mulpd operating only on register operands), and is not NUMA-aware (which depending on the hardware, can cause performance problems). The output is still relevant and indicates what can be done without tuning, but does not accurately represent the peak achievable by the hardware. Completely true. If you are aware of a "sophisticated" portable streams tester please add it to that directory. I'd love to have it. It gives an idea of what "code just compiled by the compiler can do" which is what we need in this situation, in particular what happens in going from 1 process to 2 processes. Barry From jed at 59A2.org Thu Feb 3 13:39:01 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 3 Feb 2011 16:39:01 -0300 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: <9F199EA2-E304-49F5-8EBF-2068125A1378@mcs.anl.gov> References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> <9F199EA2-E304-49F5-8EBF-2068125A1378@mcs.anl.gov> Message-ID: On Thu, Feb 3, 2011 at 16:30, Barry Smith wrote: > Completely true. If you are aware of a "sophisticated" portable streams > tester please add it to that directory. I'd love to have it. Not portable, but I have some better code for x86/64. I believe Aron has something good for Blue Gene. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Thu Feb 3 13:41:44 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 3 Feb 2011 13:41:44 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Barry, Thanks for the quick reply. I ran the benchmark/stream/BasicVersion for one and two processes and the output are as follows: -n 1 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 50 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 2529 microseconds. (= 2529 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 10161.8510 0.0032 0.0031 0.0037 Scale: 9843.6177 0.0034 0.0033 0.0038 Add: 10656.7114 0.0046 0.0045 0.0053 Triad: 10799.0448 0.0046 0.0044 0.0054 -n 2 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 50 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 4320 microseconds. (= 4320 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 5739.9704 0.0058 0.0056 0.0063 Scale: 5839.3617 0.0058 0.0055 0.0062 Add: 6116.9323 0.0081 0.0078 0.0085 Triad: 6021.0722 0.0084 0.0080 0.0088 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 50 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 2954 microseconds. (= 2954 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 6091.9448 0.0056 0.0053 0.0061 Scale: 5501.1775 0.0060 0.0058 0.0062 Add: 5960.4640 0.0084 0.0081 0.0087 Triad: 5936.2109 0.0083 0.0081 0.0089 I do not have OpenMP installed and so not sure if you wanted that when you said two threads. I also closed most of the applications that were open before running these tests and so they should hopefully be accurate. Vijay On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith wrote: > > ?Vljay > > ? Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes > > ------------------------------------------------------------------------------------------------------------------------ > Event ? ? ? ? ? ? ? ?Count ? ? ?Time (sec) ? ? Flops ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- Global --- ?--- Stage --- ? Total > ? ? ? ? ? ? ? ? ? Max Ratio ?Max ? ? Ratio ? Max ?Ratio ?Mess ? Avg len Reduct ?%T %F %M %L %R ?%T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > ?1 process > VecMAXPY ? ? ? ? ? ?3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?29 40 ?0 ?0 ?0 ?1983 > > ?2 processes > VecMAXPY ? ? ? ? ? ?3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?31 40 ?0 ?0 ?0 ?2443 > > ? The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 ?which is terrible! Now why would it be so bad (remember you cannot blame MPI) > > 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. > > 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. > > 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. > > ?In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). > > ? Barry > > > On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: > >> Matt, >> >> I apologize for the incomplete information. Find attached the >> log_summary for all the cases. >> >> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with >> 2x2GB/2x4GB configuration. I do not know how to decipher the memory >> bandwidth with this information but if you need anything more, do let >> me know. >> >> VIjay >> >> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >>> wrote: >>>> >>>> Barry, >>>> >>>> Sorry about the delay in the reply. I did not have access to the >>>> system to test out what you said, until now. >>>> >>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>>> >>>> processor ? ? ? time >>>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2 >>>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45 >>>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01 >>> >>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >>> this data. >>> 2) Do you know the memory bandwidth characteristics of this machine? That is >>> crucial and >>> ? ? you cannot begin to understand speedup on it until you do. Please look >>> this up. >>> 3) Worrying about specifics of the MPI implementation makes no sense until >>> the basics are nailed down. >>> ? ?Matt >>> >>>> >>>> The scaleup doesn't seem to be optimal, even with two processors. I am >>>> wondering if the fault is in the MPI configuration itself. Are these >>>> results as you would expect ? I can also send you the log_summary for >>>> all cases if that will help. >>>> >>>> Vijay >>>> >>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>>> >>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>>> >>>>>> Barry, >>>>>> >>>>>> I understand what you are saying but which example/options then is the >>>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>>> the nonlinear diffusion problem specifically because of its inherent >>>>>> stiffness that could lead probably provide noticeable scalability in a >>>>>> multi-core system. From your experience, do you think there is another >>>>>> example program that will demonstrate this much more rigorously or >>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>>> ex20.c and that was the original motivation for this thread. >>>>> >>>>> ? Did you follow my instructions? >>>>> >>>>> ? Barry >>>>> >>>>>> >>>>>> Satish. I configured with --download-mpich now without the >>>>>> mpich-device. The results are given above. I will try with the options >>>>>> you provided although I dont entirely understand what they mean, which >>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>>> ? >>>>>> >>>>>> Vijay >>>>>> >>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>>> >>>>>>> ? Ok, everything makes sense. Looks like you are using two level >>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid >>>>>>> problem redundantly on each process (each process is solving the entire >>>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>>> (in the two process case) >>>>>>> >>>>>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307 >>>>>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0 >>>>>>> >>>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>>> flops). ? When 3/4th of the entire run is not parallel at all you cannot >>>>>>> expect much speedup. ?If you run with -snes_view it will display exactly the >>>>>>> solver being used. You cannot expect to understand the performance if you >>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>>> there, stick with something like 5 by 5 by 5. >>>>>>> >>>>>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>>> >>>>>>> ?You should get pretty good speed up for 2 processes but not much >>>>>>> better speedup for four processes because as Matt noted the computation is >>>>>>> memory bandwidth limited; >>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>>> processors but crummy speed. >>>>>>> >>>>>>> ?Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>>> >>>>>>>> Barry, >>>>>>>> >>>>>>>> Please find attached the patch for the minor change to control the >>>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>>> this >>>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>>> just made the typing for the refinement process a little easier. I >>>>>>>> apologize if there was any confusion. >>>>>>>> >>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Vijay >>>>>>>> >>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> ?We need all the information from -log_summary to see what is going >>>>>>>>> on. >>>>>>>>> >>>>>>>>> ?Not sure what -grid 20 means but don't expect any good parallel >>>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>>> >>>>>>>>> ? Barry >>>>>>>>> >>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>>> >>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>>> >>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>>> -log_summary >>>>>>>>>> >>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >>>>>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >>>>>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >>>>>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >>>>>>>>>> >>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>>> -log_summary >>>>>>>>>> >>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >>>>>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >>>>>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >>>>>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >>>>>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >>>>>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >>>>>>>>>> >>>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>>> but >>>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>>> know. >>>>>>>>>> >>>>>>>>>> Vijay >>>>>>>>>> >>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>>> wrote: >>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Matt, >>>>>>>>>>>> >>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>>> performance >>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>>> single >>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>>> take >>>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>>> solution. But >>>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>>> let you >>>>>>>>>>>> know what that yields. >>>>>>>>>>>> >>>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>>> to >>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>>> if >>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>>> option >>>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>>> ideas are >>>>>>>>>>>> much appreciated. >>>>>>>>>>> >>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>>> On most >>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>>> using more >>>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>>> are not >>>>>>>>>>> actually >>>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>>> is why I >>>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>>> processes were >>>>>>>>>>> run and breaks down the time. >>>>>>>>>>> ? ?Matt >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Vijay >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>>> installation to >>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>>> problems, >>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>>> expected. >>>>>>>>>>>>>> My configure options are >>>>>>>>>>>>>> >>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>>> >>>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>>> ? ?Matt >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>>> configure >>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>>> programs with >>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>>> approximately the >>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>>> testing >>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>>> custom >>>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>>> unknowns. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>>> configuration or >>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>>> know. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Vijay >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>> experiments >>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>> experiments >>>>>>>>>>>>> lead. >>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments >>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>> experiments >>>>>>>>>>> lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments >>> is infinitely more interesting than any results to which their experiments >>> lead. >>> -- Norbert Wiener >>> >> > > From bsmith at mcs.anl.gov Thu Feb 3 16:00:22 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 16:00:22 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time. I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier() -------------- next part -------------- A non-text attachment was scrubbed... Name: BasicVersion.c Type: application/octet-stream Size: 5948 bytes Desc: not available URL: -------------- next part -------------- ; it is probably not a great way to do it, but better than nothing. Please try that one. Thanks Barry On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote: > Barry, > > Thanks for the quick reply. I ran the benchmark/stream/BasicVersion > for one and two processes and the output are as follows: > > -n 1 > ------------------------------------------------------------- > This system uses 8 bytes per DOUBLE PRECISION word. > ------------------------------------------------------------- > Array size = 2000000, Offset = 0 > Total memory required = 45.8 MB. > Each test is run 50 times, but only > the *best* time for each is used. > ------------------------------------------------------------- > Your clock granularity/precision appears to be 1 microseconds. > Each test below will take on the order of 2529 microseconds. > (= 2529 clock ticks) > Increase the size of the arrays if this shows that > you are not getting at least 20 clock ticks per test. > ------------------------------------------------------------- > WARNING -- The above is only a rough guideline. > For best results, please be sure you know the > precision of your system timer. > ------------------------------------------------------------- > Function Rate (MB/s) RMS time Min time Max time > Copy: 10161.8510 0.0032 0.0031 0.0037 > Scale: 9843.6177 0.0034 0.0033 0.0038 > Add: 10656.7114 0.0046 0.0045 0.0053 > Triad: 10799.0448 0.0046 0.0044 0.0054 > > -n 2 > ------------------------------------------------------------- > This system uses 8 bytes per DOUBLE PRECISION word. > ------------------------------------------------------------- > Array size = 2000000, Offset = 0 > Total memory required = 45.8 MB. > Each test is run 50 times, but only > the *best* time for each is used. > ------------------------------------------------------------- > Your clock granularity/precision appears to be 1 microseconds. > Each test below will take on the order of 4320 microseconds. > (= 4320 clock ticks) > Increase the size of the arrays if this shows that > you are not getting at least 20 clock ticks per test. > ------------------------------------------------------------- > WARNING -- The above is only a rough guideline. > For best results, please be sure you know the > precision of your system timer. > ------------------------------------------------------------- > Function Rate (MB/s) RMS time Min time Max time > Copy: 5739.9704 0.0058 0.0056 0.0063 > Scale: 5839.3617 0.0058 0.0055 0.0062 > Add: 6116.9323 0.0081 0.0078 0.0085 > Triad: 6021.0722 0.0084 0.0080 0.0088 > ------------------------------------------------------------- > This system uses 8 bytes per DOUBLE PRECISION word. > ------------------------------------------------------------- > Array size = 2000000, Offset = 0 > Total memory required = 45.8 MB. > Each test is run 50 times, but only > the *best* time for each is used. > ------------------------------------------------------------- > Your clock granularity/precision appears to be 1 microseconds. > Each test below will take on the order of 2954 microseconds. > (= 2954 clock ticks) > Increase the size of the arrays if this shows that > you are not getting at least 20 clock ticks per test. > ------------------------------------------------------------- > WARNING -- The above is only a rough guideline. > For best results, please be sure you know the > precision of your system timer. > ------------------------------------------------------------- > Function Rate (MB/s) RMS time Min time Max time > Copy: 6091.9448 0.0056 0.0053 0.0061 > Scale: 5501.1775 0.0060 0.0058 0.0062 > Add: 5960.4640 0.0084 0.0081 0.0087 > Triad: 5936.2109 0.0083 0.0081 0.0089 > > I do not have OpenMP installed and so not sure if you wanted that when > you said two threads. I also closed most of the applications that were > open before running these tests and so they should hopefully be > accurate. > > Vijay > > > On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith wrote: >> >> Vljay >> >> Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> 1 process >> VecMAXPY 3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 29 40 0 0 0 1983 >> >> 2 processes >> VecMAXPY 3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 31 40 0 0 0 2443 >> >> The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 which is terrible! Now why would it be so bad (remember you cannot blame MPI) >> >> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. >> >> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. >> >> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. >> >> In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). >> >> Barry >> >> >> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: >> >>> Matt, >>> >>> I apologize for the incomplete information. Find attached the >>> log_summary for all the cases. >>> >>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with >>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory >>> bandwidth with this information but if you need anything more, do let >>> me know. >>> >>> VIjay >>> >>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >>>> wrote: >>>>> >>>>> Barry, >>>>> >>>>> Sorry about the delay in the reply. I did not have access to the >>>>> system to test out what you said, until now. >>>>> >>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>>>> >>>>> processor time >>>>> 1 114.2 >>>>> 2 89.45 >>>>> 4 81.01 >>>> >>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >>>> this data. >>>> 2) Do you know the memory bandwidth characteristics of this machine? That is >>>> crucial and >>>> you cannot begin to understand speedup on it until you do. Please look >>>> this up. >>>> 3) Worrying about specifics of the MPI implementation makes no sense until >>>> the basics are nailed down. >>>> Matt >>>> >>>>> >>>>> The scaleup doesn't seem to be optimal, even with two processors. I am >>>>> wondering if the fault is in the MPI configuration itself. Are these >>>>> results as you would expect ? I can also send you the log_summary for >>>>> all cases if that will help. >>>>> >>>>> Vijay >>>>> >>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>>>> >>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>>>> >>>>>>> Barry, >>>>>>> >>>>>>> I understand what you are saying but which example/options then is the >>>>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>>>> the nonlinear diffusion problem specifically because of its inherent >>>>>>> stiffness that could lead probably provide noticeable scalability in a >>>>>>> multi-core system. From your experience, do you think there is another >>>>>>> example program that will demonstrate this much more rigorously or >>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>>>> ex20.c and that was the original motivation for this thread. >>>>>> >>>>>> Did you follow my instructions? >>>>>> >>>>>> Barry >>>>>> >>>>>>> >>>>>>> Satish. I configured with --download-mpich now without the >>>>>>> mpich-device. The results are given above. I will try with the options >>>>>>> you provided although I dont entirely understand what they mean, which >>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>>>> ? >>>>>>> >>>>>>> Vijay >>>>>>> >>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>>>> >>>>>>>> Ok, everything makes sense. Looks like you are using two level >>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>>>> -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid >>>>>>>> problem redundantly on each process (each process is solving the entire >>>>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>>>> (in the two process case) >>>>>>>> >>>>>>>> MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>>>> 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 >>>>>>>> MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>>>> 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 >>>>>>>> >>>>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>>>> flops). When 3/4th of the entire run is not parallel at all you cannot >>>>>>>> expect much speedup. If you run with -snes_view it will display exactly the >>>>>>>> solver being used. You cannot expect to understand the performance if you >>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>>>> there, stick with something like 5 by 5 by 5. >>>>>>>> >>>>>>>> Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>>>> >>>>>>>> You should get pretty good speed up for 2 processes but not much >>>>>>>> better speedup for four processes because as Matt noted the computation is >>>>>>>> memory bandwidth limited; >>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>>>> processors but crummy speed. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>>>> >>>>>>>>> Barry, >>>>>>>>> >>>>>>>>> Please find attached the patch for the minor change to control the >>>>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>>>> this >>>>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>>>> just made the typing for the refinement process a little easier. I >>>>>>>>> apologize if there was any confusion. >>>>>>>>> >>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Vijay >>>>>>>>> >>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> We need all the information from -log_summary to see what is going >>>>>>>>>> on. >>>>>>>>>> >>>>>>>>>> Not sure what -grid 20 means but don't expect any good parallel >>>>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>>>> >>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>>>> >>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>>>> -log_summary >>>>>>>>>>> >>>>>>>>>>> Max Max/Min Avg Total >>>>>>>>>>> Time (sec): 8.452e+00 1.00000 8.452e+00 >>>>>>>>>>> Objects: 1.470e+02 1.00000 1.470e+02 >>>>>>>>>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 >>>>>>>>>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 >>>>>>>>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>>>> MPI Reductions: 4.440e+02 1.00000 >>>>>>>>>>> >>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>>>> -log_summary >>>>>>>>>>> >>>>>>>>>>> Max Max/Min Avg Total >>>>>>>>>>> Time (sec): 7.851e+00 1.00000 7.851e+00 >>>>>>>>>>> Objects: 2.000e+02 1.00000 2.000e+02 >>>>>>>>>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 >>>>>>>>>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 >>>>>>>>>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 >>>>>>>>>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 >>>>>>>>>>> MPI Reductions: 1.046e+03 1.00000 >>>>>>>>>>> >>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>>>> but >>>>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>>>> know. >>>>>>>>>>> >>>>>>>>>>> Vijay >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>>>> wrote: >>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Matt, >>>>>>>>>>>>> >>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>>>> performance >>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>>>> single >>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>>>> take >>>>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>>>> solution. But >>>>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>>>> let you >>>>>>>>>>>>> know what that yields. >>>>>>>>>>>>> >>>>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>>>> to >>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>>>> if >>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>>>> option >>>>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>>>> ideas are >>>>>>>>>>>>> much appreciated. >>>>>>>>>>>> >>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>>>> On most >>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>>>> using more >>>>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>>>> are not >>>>>>>>>>>> actually >>>>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>>>> is why I >>>>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>>>> processes were >>>>>>>>>>>> run and breaks down the time. >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Vijay >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>>>> installation to >>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>>>> problems, >>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>> My configure options are >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>>>> Matt >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>>>> configure >>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>>>> programs with >>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>>>> approximately the >>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>>>> custom >>>>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>>>> unknowns. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>>>> configuration or >>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>> experiments >>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>> experiments >>>>>>>>>>>>>> lead. >>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>> experiments >>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>> experiments >>>>>>>>>>>> lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments >>>> is infinitely more interesting than any results to which their experiments >>>> lead. >>>> -- Norbert Wiener >>>> >>> >> >> From vijay.m at gmail.com Thu Feb 3 16:29:22 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 3 Feb 2011 16:29:22 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Barry, The outputs are attached. I do not see a big difference from the earlier results as you mentioned. Let me know if there exist a similar benchmark that might help. Vijay On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith wrote: > > ? Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time. > > ? I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier() > ; it is probably not a great way to do it, but better than nothing. Please try that one. > > ? ?Thanks > > > ? Barry > > > On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote: > >> Barry, >> >> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion >> for one and two processes and the output are as follows: >> >> -n 1 >> ------------------------------------------------------------- >> This system uses 8 bytes per DOUBLE PRECISION word. >> ------------------------------------------------------------- >> Array size = 2000000, Offset = 0 >> Total memory required = 45.8 MB. >> Each test is run 50 times, but only >> the *best* time for each is used. >> ------------------------------------------------------------- >> Your clock granularity/precision appears to be 1 microseconds. >> Each test below will take on the order of 2529 microseconds. >> ? (= 2529 clock ticks) >> Increase the size of the arrays if this shows that >> you are not getting at least 20 clock ticks per test. >> ------------------------------------------------------------- >> WARNING -- The above is only a rough guideline. >> For best results, please be sure you know the >> precision of your system timer. >> ------------------------------------------------------------- >> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time >> Copy: ? ? ? 10161.8510 ? ? ? 0.0032 ? ? ? 0.0031 ? ? ? 0.0037 >> Scale: ? ? ? 9843.6177 ? ? ? 0.0034 ? ? ? 0.0033 ? ? ? 0.0038 >> Add: ? ? ? ?10656.7114 ? ? ? 0.0046 ? ? ? 0.0045 ? ? ? 0.0053 >> Triad: ? ? ?10799.0448 ? ? ? 0.0046 ? ? ? 0.0044 ? ? ? 0.0054 >> >> -n 2 >> ------------------------------------------------------------- >> This system uses 8 bytes per DOUBLE PRECISION word. >> ------------------------------------------------------------- >> Array size = 2000000, Offset = 0 >> Total memory required = 45.8 MB. >> Each test is run 50 times, but only >> the *best* time for each is used. >> ------------------------------------------------------------- >> Your clock granularity/precision appears to be 1 microseconds. >> Each test below will take on the order of 4320 microseconds. >> ? (= 4320 clock ticks) >> Increase the size of the arrays if this shows that >> you are not getting at least 20 clock ticks per test. >> ------------------------------------------------------------- >> WARNING -- The above is only a rough guideline. >> For best results, please be sure you know the >> precision of your system timer. >> ------------------------------------------------------------- >> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time >> Copy: ? ? ? ?5739.9704 ? ? ? 0.0058 ? ? ? 0.0056 ? ? ? 0.0063 >> Scale: ? ? ? 5839.3617 ? ? ? 0.0058 ? ? ? 0.0055 ? ? ? 0.0062 >> Add: ? ? ? ? 6116.9323 ? ? ? 0.0081 ? ? ? 0.0078 ? ? ? 0.0085 >> Triad: ? ? ? 6021.0722 ? ? ? 0.0084 ? ? ? 0.0080 ? ? ? 0.0088 >> ------------------------------------------------------------- >> This system uses 8 bytes per DOUBLE PRECISION word. >> ------------------------------------------------------------- >> Array size = 2000000, Offset = 0 >> Total memory required = 45.8 MB. >> Each test is run 50 times, but only >> the *best* time for each is used. >> ------------------------------------------------------------- >> Your clock granularity/precision appears to be 1 microseconds. >> Each test below will take on the order of 2954 microseconds. >> ? (= 2954 clock ticks) >> Increase the size of the arrays if this shows that >> you are not getting at least 20 clock ticks per test. >> ------------------------------------------------------------- >> WARNING -- The above is only a rough guideline. >> For best results, please be sure you know the >> precision of your system timer. >> ------------------------------------------------------------- >> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time >> Copy: ? ? ? ?6091.9448 ? ? ? 0.0056 ? ? ? 0.0053 ? ? ? 0.0061 >> Scale: ? ? ? 5501.1775 ? ? ? 0.0060 ? ? ? 0.0058 ? ? ? 0.0062 >> Add: ? ? ? ? 5960.4640 ? ? ? 0.0084 ? ? ? 0.0081 ? ? ? 0.0087 >> Triad: ? ? ? 5936.2109 ? ? ? 0.0083 ? ? ? 0.0081 ? ? ? 0.0089 >> >> I do not have OpenMP installed and so not sure if you wanted that when >> you said two threads. I also closed most of the applications that were >> open before running these tests and so they should hopefully be >> accurate. >> >> Vijay >> >> >> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith wrote: >>> >>> ?Vljay >>> >>> ? Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event ? ? ? ? ? ? ? ?Count ? ? ?Time (sec) ? ? Flops ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- Global --- ?--- Stage --- ? Total >>> ? ? ? ? ? ? ? ? ? Max Ratio ?Max ? ? Ratio ? Max ?Ratio ?Mess ? Avg len Reduct ?%T %F %M %L %R ?%T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> ?1 process >>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?29 40 ?0 ?0 ?0 ?1983 >>> >>> ?2 processes >>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?31 40 ?0 ?0 ?0 ?2443 >>> >>> ? The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 ?which is terrible! Now why would it be so bad (remember you cannot blame MPI) >>> >>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. >>> >>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. >>> >>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. >>> >>> ?In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). >>> >>> ? Barry >>> >>> >>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: >>> >>>> Matt, >>>> >>>> I apologize for the incomplete information. Find attached the >>>> log_summary for all the cases. >>>> >>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with >>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory >>>> bandwidth with this information but if you need anything more, do let >>>> me know. >>>> >>>> VIjay >>>> >>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >>>>> wrote: >>>>>> >>>>>> Barry, >>>>>> >>>>>> Sorry about the delay in the reply. I did not have access to the >>>>>> system to test out what you said, until now. >>>>>> >>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>>>>> >>>>>> processor ? ? ? time >>>>>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2 >>>>>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45 >>>>>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01 >>>>> >>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >>>>> this data. >>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is >>>>> crucial and >>>>> ? ? you cannot begin to understand speedup on it until you do. Please look >>>>> this up. >>>>> 3) Worrying about specifics of the MPI implementation makes no sense until >>>>> the basics are nailed down. >>>>> ? ?Matt >>>>> >>>>>> >>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am >>>>>> wondering if the fault is in the MPI configuration itself. Are these >>>>>> results as you would expect ? I can also send you the log_summary for >>>>>> all cases if that will help. >>>>>> >>>>>> Vijay >>>>>> >>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>>>>> >>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>>>>> >>>>>>>> Barry, >>>>>>>> >>>>>>>> I understand what you are saying but which example/options then is the >>>>>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>>>>> the nonlinear diffusion problem specifically because of its inherent >>>>>>>> stiffness that could lead probably provide noticeable scalability in a >>>>>>>> multi-core system. From your experience, do you think there is another >>>>>>>> example program that will demonstrate this much more rigorously or >>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>>>>> ex20.c and that was the original motivation for this thread. >>>>>>> >>>>>>> ? Did you follow my instructions? >>>>>>> >>>>>>> ? Barry >>>>>>> >>>>>>>> >>>>>>>> Satish. I configured with --download-mpich now without the >>>>>>>> mpich-device. The results are given above. I will try with the options >>>>>>>> you provided although I dont entirely understand what they mean, which >>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>>>>> ? >>>>>>>> >>>>>>>> Vijay >>>>>>>> >>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>>>>> >>>>>>>>> ? Ok, everything makes sense. Looks like you are using two level >>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>>>>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid >>>>>>>>> problem redundantly on each process (each process is solving the entire >>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>>>>> (in the two process case) >>>>>>>>> >>>>>>>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307 >>>>>>>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0 >>>>>>>>> >>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>>>>> flops). ? When 3/4th of the entire run is not parallel at all you cannot >>>>>>>>> expect much speedup. ?If you run with -snes_view it will display exactly the >>>>>>>>> solver being used. You cannot expect to understand the performance if you >>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>>>>> there, stick with something like 5 by 5 by 5. >>>>>>>>> >>>>>>>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>>>>> >>>>>>>>> ?You should get pretty good speed up for 2 processes but not much >>>>>>>>> better speedup for four processes because as Matt noted the computation is >>>>>>>>> memory bandwidth limited; >>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>>>>> processors but crummy speed. >>>>>>>>> >>>>>>>>> ?Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>>>>> >>>>>>>>>> Barry, >>>>>>>>>> >>>>>>>>>> Please find attached the patch for the minor change to control the >>>>>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>>>>> this >>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>>>>> just made the typing for the refinement process a little easier. I >>>>>>>>>> apologize if there was any confusion. >>>>>>>>>> >>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Vijay >>>>>>>>>> >>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> ?We need all the information from -log_summary to see what is going >>>>>>>>>>> on. >>>>>>>>>>> >>>>>>>>>>> ?Not sure what -grid 20 means but don't expect any good parallel >>>>>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>>>>> >>>>>>>>>>> ? Barry >>>>>>>>>>> >>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>>>>> >>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>>>>> -log_summary >>>>>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>>>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >>>>>>>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >>>>>>>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>>>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>>>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>>>>> -log_summary >>>>>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>>>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >>>>>>>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >>>>>>>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >>>>>>>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >>>>>>>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >>>>>>>>>>>> >>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>>>>> but >>>>>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>>>>> know. >>>>>>>>>>>> >>>>>>>>>>>> Vijay >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Matt, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>>>>> single >>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>>>>> take >>>>>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>>>>> solution. But >>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>>>>> let you >>>>>>>>>>>>>> know what that yields. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>>>>> to >>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>>>>> if >>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>>>>> option >>>>>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>>>>> ideas are >>>>>>>>>>>>>> much appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>>>>> On most >>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>>>>> using more >>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>>>>> are not >>>>>>>>>>>>> actually >>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>>>>> is why I >>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>>>>> processes were >>>>>>>>>>>>> run and breaks down the time. >>>>>>>>>>>>> ? ?Matt >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>>>>> installation to >>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>>>>> problems, >>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>> My configure options are >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>>>>> ? ?Matt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>>>>> configure >>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>>>>> programs with >>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>>>>> approximately the >>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>>>>> custom >>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>>>>> unknowns. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>>>>> configuration or >>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> lead. >>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>> experiments >>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>> experiments >>>>>>>>>>>>> lead. >>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments >>>>> is infinitely more interesting than any results to which their experiments >>>>> lead. >>>>> -- Norbert Wiener >>>>> >>>> >>> >>> > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: basicversion_np1.out Type: application/octet-stream Size: 999 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: basicversion_np2.out Type: application/octet-stream Size: 1999 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Feb 3 16:46:02 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 16:46:02 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth. Barry * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely. On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote: > Barry, > > The outputs are attached. I do not see a big difference from the > earlier results as you mentioned. > > Let me know if there exist a similar benchmark that might help. > > Vijay > > On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith wrote: >> >> Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time. >> >> I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier() >> ; it is probably not a great way to do it, but better than nothing. Please try that one. >> >> Thanks >> >> >> Barry >> >> >> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote: >> >>> Barry, >>> >>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion >>> for one and two processes and the output are as follows: >>> >>> -n 1 >>> ------------------------------------------------------------- >>> This system uses 8 bytes per DOUBLE PRECISION word. >>> ------------------------------------------------------------- >>> Array size = 2000000, Offset = 0 >>> Total memory required = 45.8 MB. >>> Each test is run 50 times, but only >>> the *best* time for each is used. >>> ------------------------------------------------------------- >>> Your clock granularity/precision appears to be 1 microseconds. >>> Each test below will take on the order of 2529 microseconds. >>> (= 2529 clock ticks) >>> Increase the size of the arrays if this shows that >>> you are not getting at least 20 clock ticks per test. >>> ------------------------------------------------------------- >>> WARNING -- The above is only a rough guideline. >>> For best results, please be sure you know the >>> precision of your system timer. >>> ------------------------------------------------------------- >>> Function Rate (MB/s) RMS time Min time Max time >>> Copy: 10161.8510 0.0032 0.0031 0.0037 >>> Scale: 9843.6177 0.0034 0.0033 0.0038 >>> Add: 10656.7114 0.0046 0.0045 0.0053 >>> Triad: 10799.0448 0.0046 0.0044 0.0054 >>> >>> -n 2 >>> ------------------------------------------------------------- >>> This system uses 8 bytes per DOUBLE PRECISION word. >>> ------------------------------------------------------------- >>> Array size = 2000000, Offset = 0 >>> Total memory required = 45.8 MB. >>> Each test is run 50 times, but only >>> the *best* time for each is used. >>> ------------------------------------------------------------- >>> Your clock granularity/precision appears to be 1 microseconds. >>> Each test below will take on the order of 4320 microseconds. >>> (= 4320 clock ticks) >>> Increase the size of the arrays if this shows that >>> you are not getting at least 20 clock ticks per test. >>> ------------------------------------------------------------- >>> WARNING -- The above is only a rough guideline. >>> For best results, please be sure you know the >>> precision of your system timer. >>> ------------------------------------------------------------- >>> Function Rate (MB/s) RMS time Min time Max time >>> Copy: 5739.9704 0.0058 0.0056 0.0063 >>> Scale: 5839.3617 0.0058 0.0055 0.0062 >>> Add: 6116.9323 0.0081 0.0078 0.0085 >>> Triad: 6021.0722 0.0084 0.0080 0.0088 >>> ------------------------------------------------------------- >>> This system uses 8 bytes per DOUBLE PRECISION word. >>> ------------------------------------------------------------- >>> Array size = 2000000, Offset = 0 >>> Total memory required = 45.8 MB. >>> Each test is run 50 times, but only >>> the *best* time for each is used. >>> ------------------------------------------------------------- >>> Your clock granularity/precision appears to be 1 microseconds. >>> Each test below will take on the order of 2954 microseconds. >>> (= 2954 clock ticks) >>> Increase the size of the arrays if this shows that >>> you are not getting at least 20 clock ticks per test. >>> ------------------------------------------------------------- >>> WARNING -- The above is only a rough guideline. >>> For best results, please be sure you know the >>> precision of your system timer. >>> ------------------------------------------------------------- >>> Function Rate (MB/s) RMS time Min time Max time >>> Copy: 6091.9448 0.0056 0.0053 0.0061 >>> Scale: 5501.1775 0.0060 0.0058 0.0062 >>> Add: 5960.4640 0.0084 0.0081 0.0087 >>> Triad: 5936.2109 0.0083 0.0081 0.0089 >>> >>> I do not have OpenMP installed and so not sure if you wanted that when >>> you said two threads. I also closed most of the applications that were >>> open before running these tests and so they should hopefully be >>> accurate. >>> >>> Vijay >>> >>> >>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith wrote: >>>> >>>> Vljay >>>> >>>> Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes >>>> >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> Event Count Time (sec) Flops --- Global --- --- Stage --- Total >>>> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> 1 process >>>> VecMAXPY 3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 29 40 0 0 0 1983 >>>> >>>> 2 processes >>>> VecMAXPY 3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 31 40 0 0 0 2443 >>>> >>>> The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 which is terrible! Now why would it be so bad (remember you cannot blame MPI) >>>> >>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. >>>> >>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. >>>> >>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. >>>> >>>> In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). >>>> >>>> Barry >>>> >>>> >>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: >>>> >>>>> Matt, >>>>> >>>>> I apologize for the incomplete information. Find attached the >>>>> log_summary for all the cases. >>>>> >>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with >>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory >>>>> bandwidth with this information but if you need anything more, do let >>>>> me know. >>>>> >>>>> VIjay >>>>> >>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >>>>>> wrote: >>>>>>> >>>>>>> Barry, >>>>>>> >>>>>>> Sorry about the delay in the reply. I did not have access to the >>>>>>> system to test out what you said, until now. >>>>>>> >>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>>>>>> >>>>>>> processor time >>>>>>> 1 114.2 >>>>>>> 2 89.45 >>>>>>> 4 81.01 >>>>>> >>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >>>>>> this data. >>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is >>>>>> crucial and >>>>>> you cannot begin to understand speedup on it until you do. Please look >>>>>> this up. >>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until >>>>>> the basics are nailed down. >>>>>> Matt >>>>>> >>>>>>> >>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am >>>>>>> wondering if the fault is in the MPI configuration itself. Are these >>>>>>> results as you would expect ? I can also send you the log_summary for >>>>>>> all cases if that will help. >>>>>>> >>>>>>> Vijay >>>>>>> >>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>>>>>> >>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>>>>>> >>>>>>>>> Barry, >>>>>>>>> >>>>>>>>> I understand what you are saying but which example/options then is the >>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>>>>>> the nonlinear diffusion problem specifically because of its inherent >>>>>>>>> stiffness that could lead probably provide noticeable scalability in a >>>>>>>>> multi-core system. From your experience, do you think there is another >>>>>>>>> example program that will demonstrate this much more rigorously or >>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>>>>>> ex20.c and that was the original motivation for this thread. >>>>>>>> >>>>>>>> Did you follow my instructions? >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>>> >>>>>>>>> Satish. I configured with --download-mpich now without the >>>>>>>>> mpich-device. The results are given above. I will try with the options >>>>>>>>> you provided although I dont entirely understand what they mean, which >>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>>>>>> ? >>>>>>>>> >>>>>>>>> Vijay >>>>>>>>> >>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>>>>>> >>>>>>>>>> Ok, everything makes sense. Looks like you are using two level >>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>>>>>> -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid >>>>>>>>>> problem redundantly on each process (each process is solving the entire >>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>>>>>> (in the two process case) >>>>>>>>>> >>>>>>>>>> MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>>>>>> 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 >>>>>>>>>> MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>>>>>> 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 >>>>>>>>>> >>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>>>>>> flops). When 3/4th of the entire run is not parallel at all you cannot >>>>>>>>>> expect much speedup. If you run with -snes_view it will display exactly the >>>>>>>>>> solver being used. You cannot expect to understand the performance if you >>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>>>>>> there, stick with something like 5 by 5 by 5. >>>>>>>>>> >>>>>>>>>> Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>>>>>> >>>>>>>>>> You should get pretty good speed up for 2 processes but not much >>>>>>>>>> better speedup for four processes because as Matt noted the computation is >>>>>>>>>> memory bandwidth limited; >>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>>>>>> processors but crummy speed. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>>>>>> >>>>>>>>>>> Barry, >>>>>>>>>>> >>>>>>>>>>> Please find attached the patch for the minor change to control the >>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>>>>>> this >>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>>>>>> just made the typing for the refinement process a little easier. I >>>>>>>>>>> apologize if there was any confusion. >>>>>>>>>>> >>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> Vijay >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> We need all the information from -log_summary to see what is going >>>>>>>>>>>> on. >>>>>>>>>>>> >>>>>>>>>>>> Not sure what -grid 20 means but don't expect any good parallel >>>>>>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>>>>>> -log_summary >>>>>>>>>>>>> >>>>>>>>>>>>> Max Max/Min Avg Total >>>>>>>>>>>>> Time (sec): 8.452e+00 1.00000 8.452e+00 >>>>>>>>>>>>> Objects: 1.470e+02 1.00000 1.470e+02 >>>>>>>>>>>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 >>>>>>>>>>>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 >>>>>>>>>>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>>>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>>>>>> MPI Reductions: 4.440e+02 1.00000 >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>>>>>> -log_summary >>>>>>>>>>>>> >>>>>>>>>>>>> Max Max/Min Avg Total >>>>>>>>>>>>> Time (sec): 7.851e+00 1.00000 7.851e+00 >>>>>>>>>>>>> Objects: 2.000e+02 1.00000 2.000e+02 >>>>>>>>>>>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 >>>>>>>>>>>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 >>>>>>>>>>>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 >>>>>>>>>>>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 >>>>>>>>>>>>> MPI Reductions: 1.046e+03 1.00000 >>>>>>>>>>>>> >>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>>>>>> but >>>>>>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>>>>>> know. >>>>>>>>>>>>> >>>>>>>>>>>>> Vijay >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Matt, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>>>>>> single >>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>>>>>> take >>>>>>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>>>>>> solution. But >>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>>>>>> let you >>>>>>>>>>>>>>> know what that yields. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>>>>>> option >>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>>>>>> ideas are >>>>>>>>>>>>>>> much appreciated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>>>>>> On most >>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>>>>>> using more >>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>>>>>> are not >>>>>>>>>>>>>> actually >>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>>>>>> is why I >>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>>>>>> processes were >>>>>>>>>>>>>> run and breaks down the time. >>>>>>>>>>>>>> Matt >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>>>>>> installation to >>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>>>>>> problems, >>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>>> My configure options are >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>>>>>> configure >>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>>>>>> programs with >>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>>>>>> approximately the >>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>>>>>> custom >>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>>>>>> unknowns. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>>>>>> configuration or >>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> lead. >>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>> experiments >>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>> experiments >>>>>>>>>>>>>> lead. >>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments >>>>>> is infinitely more interesting than any results to which their experiments >>>>>> lead. >>>>>> -- Norbert Wiener >>>>>> >>>>> >>>> >>>> >> >> >> > From vijay.m at gmail.com Thu Feb 3 17:31:04 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 3 Feb 2011 17:31:04 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Barry, That sucks. I am sure that it is not a single processor machine although I've not yet opened it up and checked it for sure ;) It is dual booted with windows and I am going to use the Intel performance counters to find the bandwidth limit in windows/linux. Also, I did find a benchmark for Ubuntu after bit of searching around and will try to see if it can provide more details. Here are the links for the benchmarks. http://software.intel.com/en-us/articles/intel-performance-counter-monitor/ http://manpages.ubuntu.com/manpages/maverick/lmbench.8.html Hopefully the numbers from Windows and Ubuntu will match and if not, maybe my Ubuntu configuration needs a bit of tweaking to get this correct. I will keep you updated if I find something interesting. Thanks for all the helpful comments ! Vijay On Thu, Feb 3, 2011 at 4:46 PM, Barry Smith wrote: > > ? Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth. > > ? Barry > > * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely. > > On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote: > >> Barry, >> >> The outputs are attached. I do not see a big difference from the >> earlier results as you mentioned. >> >> Let me know if there exist a similar benchmark that might help. >> >> Vijay >> >> On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith wrote: >>> >>> ? Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time. >>> >>> ? I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier() >>> ; it is probably not a great way to do it, but better than nothing. Please try that one. >>> >>> ? ?Thanks >>> >>> >>> ? Barry >>> >>> >>> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote: >>> >>>> Barry, >>>> >>>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion >>>> for one and two processes and the output are as follows: >>>> >>>> -n 1 >>>> ------------------------------------------------------------- >>>> This system uses 8 bytes per DOUBLE PRECISION word. >>>> ------------------------------------------------------------- >>>> Array size = 2000000, Offset = 0 >>>> Total memory required = 45.8 MB. >>>> Each test is run 50 times, but only >>>> the *best* time for each is used. >>>> ------------------------------------------------------------- >>>> Your clock granularity/precision appears to be 1 microseconds. >>>> Each test below will take on the order of 2529 microseconds. >>>> ? (= 2529 clock ticks) >>>> Increase the size of the arrays if this shows that >>>> you are not getting at least 20 clock ticks per test. >>>> ------------------------------------------------------------- >>>> WARNING -- The above is only a rough guideline. >>>> For best results, please be sure you know the >>>> precision of your system timer. >>>> ------------------------------------------------------------- >>>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time >>>> Copy: ? ? ? 10161.8510 ? ? ? 0.0032 ? ? ? 0.0031 ? ? ? 0.0037 >>>> Scale: ? ? ? 9843.6177 ? ? ? 0.0034 ? ? ? 0.0033 ? ? ? 0.0038 >>>> Add: ? ? ? ?10656.7114 ? ? ? 0.0046 ? ? ? 0.0045 ? ? ? 0.0053 >>>> Triad: ? ? ?10799.0448 ? ? ? 0.0046 ? ? ? 0.0044 ? ? ? 0.0054 >>>> >>>> -n 2 >>>> ------------------------------------------------------------- >>>> This system uses 8 bytes per DOUBLE PRECISION word. >>>> ------------------------------------------------------------- >>>> Array size = 2000000, Offset = 0 >>>> Total memory required = 45.8 MB. >>>> Each test is run 50 times, but only >>>> the *best* time for each is used. >>>> ------------------------------------------------------------- >>>> Your clock granularity/precision appears to be 1 microseconds. >>>> Each test below will take on the order of 4320 microseconds. >>>> ? (= 4320 clock ticks) >>>> Increase the size of the arrays if this shows that >>>> you are not getting at least 20 clock ticks per test. >>>> ------------------------------------------------------------- >>>> WARNING -- The above is only a rough guideline. >>>> For best results, please be sure you know the >>>> precision of your system timer. >>>> ------------------------------------------------------------- >>>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time >>>> Copy: ? ? ? ?5739.9704 ? ? ? 0.0058 ? ? ? 0.0056 ? ? ? 0.0063 >>>> Scale: ? ? ? 5839.3617 ? ? ? 0.0058 ? ? ? 0.0055 ? ? ? 0.0062 >>>> Add: ? ? ? ? 6116.9323 ? ? ? 0.0081 ? ? ? 0.0078 ? ? ? 0.0085 >>>> Triad: ? ? ? 6021.0722 ? ? ? 0.0084 ? ? ? 0.0080 ? ? ? 0.0088 >>>> ------------------------------------------------------------- >>>> This system uses 8 bytes per DOUBLE PRECISION word. >>>> ------------------------------------------------------------- >>>> Array size = 2000000, Offset = 0 >>>> Total memory required = 45.8 MB. >>>> Each test is run 50 times, but only >>>> the *best* time for each is used. >>>> ------------------------------------------------------------- >>>> Your clock granularity/precision appears to be 1 microseconds. >>>> Each test below will take on the order of 2954 microseconds. >>>> ? (= 2954 clock ticks) >>>> Increase the size of the arrays if this shows that >>>> you are not getting at least 20 clock ticks per test. >>>> ------------------------------------------------------------- >>>> WARNING -- The above is only a rough guideline. >>>> For best results, please be sure you know the >>>> precision of your system timer. >>>> ------------------------------------------------------------- >>>> Function ? ? ?Rate (MB/s) ? RMS time ? ? Min time ? ? Max time >>>> Copy: ? ? ? ?6091.9448 ? ? ? 0.0056 ? ? ? 0.0053 ? ? ? 0.0061 >>>> Scale: ? ? ? 5501.1775 ? ? ? 0.0060 ? ? ? 0.0058 ? ? ? 0.0062 >>>> Add: ? ? ? ? 5960.4640 ? ? ? 0.0084 ? ? ? 0.0081 ? ? ? 0.0087 >>>> Triad: ? ? ? 5936.2109 ? ? ? 0.0083 ? ? ? 0.0081 ? ? ? 0.0089 >>>> >>>> I do not have OpenMP installed and so not sure if you wanted that when >>>> you said two threads. I also closed most of the applications that were >>>> open before running these tests and so they should hopefully be >>>> accurate. >>>> >>>> Vijay >>>> >>>> >>>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith wrote: >>>>> >>>>> ?Vljay >>>>> >>>>> ? Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes >>>>> >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> Event ? ? ? ? ? ? ? ?Count ? ? ?Time (sec) ? ? Flops ? ? ? ? ? ? ? ? ? ? ? ? ? ? --- Global --- ?--- Stage --- ? Total >>>>> ? ? ? ? ? ? ? ? ? Max Ratio ?Max ? ? Ratio ? Max ?Ratio ?Mess ? Avg len Reduct ?%T %F %M %L %R ?%T %F %M %L %R Mflop/s >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> >>>>> ?1 process >>>>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?29 40 ?0 ?0 ?0 ?1983 >>>>> >>>>> ?2 processes >>>>> VecMAXPY ? ? ? ? ? ?3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 ?0 ?0 ?0 ?31 40 ?0 ?0 ?0 ?2443 >>>>> >>>>> ? The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 ?which is terrible! Now why would it be so bad (remember you cannot blame MPI) >>>>> >>>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. >>>>> >>>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. >>>>> >>>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. >>>>> >>>>> ?In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). >>>>> >>>>> ? Barry >>>>> >>>>> >>>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: >>>>> >>>>>> Matt, >>>>>> >>>>>> I apologize for the incomplete information. Find attached the >>>>>> log_summary for all the cases. >>>>>> >>>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with >>>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory >>>>>> bandwidth with this information but if you need anything more, do let >>>>>> me know. >>>>>> >>>>>> VIjay >>>>>> >>>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >>>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >>>>>>> wrote: >>>>>>>> >>>>>>>> Barry, >>>>>>>> >>>>>>>> Sorry about the delay in the reply. I did not have access to the >>>>>>>> system to test out what you said, until now. >>>>>>>> >>>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>>>>>>> >>>>>>>> processor ? ? ? time >>>>>>>> 1 ? ? ? ? ? ? ? ? ? ? ?114.2 >>>>>>>> 2 ? ? ? ? ? ? ? ? ? ? ?89.45 >>>>>>>> 4 ? ? ? ? ? ? ? ? ? ? ?81.01 >>>>>>> >>>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >>>>>>> this data. >>>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is >>>>>>> crucial and >>>>>>> ? ? you cannot begin to understand speedup on it until you do. Please look >>>>>>> this up. >>>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until >>>>>>> the basics are nailed down. >>>>>>> ? ?Matt >>>>>>> >>>>>>>> >>>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am >>>>>>>> wondering if the fault is in the MPI configuration itself. Are these >>>>>>>> results as you would expect ? I can also send you the log_summary for >>>>>>>> all cases if that will help. >>>>>>>> >>>>>>>> Vijay >>>>>>>> >>>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>>>>>>> >>>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>>>>>>> >>>>>>>>>> Barry, >>>>>>>>>> >>>>>>>>>> I understand what you are saying but which example/options then is the >>>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>>>>>>> the nonlinear diffusion problem specifically because of its inherent >>>>>>>>>> stiffness that could lead probably provide noticeable scalability in a >>>>>>>>>> multi-core system. From your experience, do you think there is another >>>>>>>>>> example program that will demonstrate this much more rigorously or >>>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>>>>>>> ex20.c and that was the original motivation for this thread. >>>>>>>>> >>>>>>>>> ? Did you follow my instructions? >>>>>>>>> >>>>>>>>> ? Barry >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Satish. I configured with --download-mpich now without the >>>>>>>>>> mpich-device. The results are given above. I will try with the options >>>>>>>>>> you provided although I dont entirely understand what they mean, which >>>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> Vijay >>>>>>>>>> >>>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>>>>>>> >>>>>>>>>>> ? Ok, everything makes sense. Looks like you are using two level >>>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>>>>>>> -mg_coarse_redundant_pc_type lu ?This means it is solving the coarse grid >>>>>>>>>>> problem redundantly on each process (each process is solving the entire >>>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>>>>>>> (in the two process case) >>>>>>>>>>> >>>>>>>>>>> MatLUFactorNum ? ? ? ?14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>>>>>>> 0.0e+00 0.0e+00 37 41 ?0 ?0 ?0 ?74 82 ?0 ?0 ?0 ?1307 >>>>>>>>>>> MatILUFactorSym ? ? ? ?7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>>>>>>> 0.0e+00 7.0e+00 ?0 ?0 ?0 ?0 ?1 ? 0 ?0 ?0 ?0 ?2 ? ? 0 >>>>>>>>>>> >>>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>>>>>>> flops). ? When 3/4th of the entire run is not parallel at all you cannot >>>>>>>>>>> expect much speedup. ?If you run with -snes_view it will display exactly the >>>>>>>>>>> solver being used. You cannot expect to understand the performance if you >>>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>>>>>>> there, stick with something like 5 by 5 by 5. >>>>>>>>>>> >>>>>>>>>>> ?Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>>>>>>> >>>>>>>>>>> ?You should get pretty good speed up for 2 processes but not much >>>>>>>>>>> better speedup for four processes because as Matt noted the computation is >>>>>>>>>>> memory bandwidth limited; >>>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>>>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>>>>>>> processors but crummy speed. >>>>>>>>>>> >>>>>>>>>>> ?Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>>>>>>> >>>>>>>>>>>> Barry, >>>>>>>>>>>> >>>>>>>>>>>> Please find attached the patch for the minor change to control the >>>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>>>>>>> this >>>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>>>>>>> just made the typing for the refinement process a little easier. I >>>>>>>>>>>> apologize if there was any confusion. >>>>>>>>>>>> >>>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> Vijay >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ?We need all the information from -log_summary to see what is going >>>>>>>>>>>>> on. >>>>>>>>>>>>> >>>>>>>>>>>>> ?Not sure what -grid 20 means but don't expect any good parallel >>>>>>>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>>>>>>> >>>>>>>>>>>>> ? Barry >>>>>>>>>>>>> >>>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>>>>>>> -log_summary >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>>>>>>>>>> Time (sec): ? ? ? ? ? 8.452e+00 ? ? ?1.00000 ? 8.452e+00 >>>>>>>>>>>>>> Objects: ? ? ? ? ? ? ?1.470e+02 ? ? ?1.00000 ? 1.470e+02 >>>>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?5.045e+09 ? ? ?1.00000 ? 5.045e+09 ?5.045e+09 >>>>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.969e+08 ? ? ?1.00000 ? 5.969e+08 ?5.969e+08 >>>>>>>>>>>>>> MPI Messages: ? ? ? ? 0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>>>>>>>>>> MPI Message Lengths: ?0.000e+00 ? ? ?0.00000 ? 0.000e+00 ?0.000e+00 >>>>>>>>>>>>>> MPI Reductions: ? ? ? 4.440e+02 ? ? ?1.00000 >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>>>>>>> -log_summary >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? Max ? ? ? Max/Min ? ? ? ?Avg ? ? ?Total >>>>>>>>>>>>>> Time (sec): ? ? ? ? ? 7.851e+00 ? ? ?1.00000 ? 7.851e+00 >>>>>>>>>>>>>> Objects: ? ? ? ? ? ? ?2.000e+02 ? ? ?1.00000 ? 2.000e+02 >>>>>>>>>>>>>> Flops: ? ? ? ? ? ? ? ?4.670e+09 ? ? ?1.00580 ? 4.657e+09 ?9.313e+09 >>>>>>>>>>>>>> Flops/sec: ? ? ? ? ? ?5.948e+08 ? ? ?1.00580 ? 5.931e+08 ?1.186e+09 >>>>>>>>>>>>>> MPI Messages: ? ? ? ? 7.965e+02 ? ? ?1.00000 ? 7.965e+02 ?1.593e+03 >>>>>>>>>>>>>> MPI Message Lengths: ?1.412e+07 ? ? ?1.00000 ? 1.773e+04 ?2.824e+07 >>>>>>>>>>>>>> MPI Reductions: ? ? ? 1.046e+03 ? ? ?1.00000 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>>>>>>> but >>>>>>>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>>>>>>> know. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Matt, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>>>>>>> single >>>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>>>>>>> take >>>>>>>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>>>>>>> solution. But >>>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>>>>>>> let you >>>>>>>>>>>>>>>> know what that yields. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>>>>>>> option >>>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>>>>>>> ideas are >>>>>>>>>>>>>>>> much appreciated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>>>>>>> On most >>>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>>>>>>> using more >>>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>>>>>>> are not >>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>>>>>>> is why I >>>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>>>>>>> processes were >>>>>>>>>>>>>>> run and breaks down the time. >>>>>>>>>>>>>>> ? ?Matt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>>>>>>> installation to >>>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>>>>>>> problems, >>>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>>>> My configure options are >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>>>>>>> ? ?Matt >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>>>>>>> configure >>>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>>>>>>> programs with >>>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>>>>>>> approximately the >>>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>>>>>>> custom >>>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>>>>>>> unknowns. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>>>>>>> configuration or >>>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>> lead. >>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> lead. >>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments >>>>>>> is infinitely more interesting than any results to which their experiments >>>>>>> lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>> >>>>> >>>>> >>> >>> >>> >> > > From bsmith at mcs.anl.gov Thu Feb 3 17:57:30 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 17:57:30 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: On Feb 3, 2011, at 5:31 PM, Vijay S. Mahadevan wrote: > Barry, > > That sucks. I am sure that it is not a single processor machine > although I've not yet opened it up and checked it for sure ;) I didn't mean that it was literally a single processor machine, just effectively for iterative linear solvers. Barry > It is > dual booted with windows and I am going to use the Intel performance > counters to find the bandwidth limit in windows/linux. Also, I did > find a benchmark for Ubuntu after bit of searching around and will try > to see if it can provide more details. Here are the links for the > benchmarks. > > http://software.intel.com/en-us/articles/intel-performance-counter-monitor/ > http://manpages.ubuntu.com/manpages/maverick/lmbench.8.html > > Hopefully the numbers from Windows and Ubuntu will match and if not, > maybe my Ubuntu configuration needs a bit of tweaking to get this > correct. I will keep you updated if I find something interesting. > Thanks for all the helpful comments ! > > Vijay > > On Thu, Feb 3, 2011 at 4:46 PM, Barry Smith wrote: >> >> Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth. >> >> Barry >> >> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely. >> >> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote: >> >>> Barry, >>> >>> The outputs are attached. I do not see a big difference from the >>> earlier results as you mentioned. >>> >>> Let me know if there exist a similar benchmark that might help. >>> >>> Vijay >>> >>> On Thu, Feb 3, 2011 at 4:00 PM, Barry Smith wrote: >>>> >>>> Hmm, just running the basic version with mpiexec -n 2 processes isn't that useful because there is nothing to make sure they are both running at exactly the same time. >>>> >>>> I've attached a new version of BasicVersion.c that attempts to synchronize the operations in the two processes using MPI_Barrier() >>>> ; it is probably not a great way to do it, but better than nothing. Please try that one. >>>> >>>> Thanks >>>> >>>> >>>> Barry >>>> >>>> >>>> On Feb 3, 2011, at 1:41 PM, Vijay S. Mahadevan wrote: >>>> >>>>> Barry, >>>>> >>>>> Thanks for the quick reply. I ran the benchmark/stream/BasicVersion >>>>> for one and two processes and the output are as follows: >>>>> >>>>> -n 1 >>>>> ------------------------------------------------------------- >>>>> This system uses 8 bytes per DOUBLE PRECISION word. >>>>> ------------------------------------------------------------- >>>>> Array size = 2000000, Offset = 0 >>>>> Total memory required = 45.8 MB. >>>>> Each test is run 50 times, but only >>>>> the *best* time for each is used. >>>>> ------------------------------------------------------------- >>>>> Your clock granularity/precision appears to be 1 microseconds. >>>>> Each test below will take on the order of 2529 microseconds. >>>>> (= 2529 clock ticks) >>>>> Increase the size of the arrays if this shows that >>>>> you are not getting at least 20 clock ticks per test. >>>>> ------------------------------------------------------------- >>>>> WARNING -- The above is only a rough guideline. >>>>> For best results, please be sure you know the >>>>> precision of your system timer. >>>>> ------------------------------------------------------------- >>>>> Function Rate (MB/s) RMS time Min time Max time >>>>> Copy: 10161.8510 0.0032 0.0031 0.0037 >>>>> Scale: 9843.6177 0.0034 0.0033 0.0038 >>>>> Add: 10656.7114 0.0046 0.0045 0.0053 >>>>> Triad: 10799.0448 0.0046 0.0044 0.0054 >>>>> >>>>> -n 2 >>>>> ------------------------------------------------------------- >>>>> This system uses 8 bytes per DOUBLE PRECISION word. >>>>> ------------------------------------------------------------- >>>>> Array size = 2000000, Offset = 0 >>>>> Total memory required = 45.8 MB. >>>>> Each test is run 50 times, but only >>>>> the *best* time for each is used. >>>>> ------------------------------------------------------------- >>>>> Your clock granularity/precision appears to be 1 microseconds. >>>>> Each test below will take on the order of 4320 microseconds. >>>>> (= 4320 clock ticks) >>>>> Increase the size of the arrays if this shows that >>>>> you are not getting at least 20 clock ticks per test. >>>>> ------------------------------------------------------------- >>>>> WARNING -- The above is only a rough guideline. >>>>> For best results, please be sure you know the >>>>> precision of your system timer. >>>>> ------------------------------------------------------------- >>>>> Function Rate (MB/s) RMS time Min time Max time >>>>> Copy: 5739.9704 0.0058 0.0056 0.0063 >>>>> Scale: 5839.3617 0.0058 0.0055 0.0062 >>>>> Add: 6116.9323 0.0081 0.0078 0.0085 >>>>> Triad: 6021.0722 0.0084 0.0080 0.0088 >>>>> ------------------------------------------------------------- >>>>> This system uses 8 bytes per DOUBLE PRECISION word. >>>>> ------------------------------------------------------------- >>>>> Array size = 2000000, Offset = 0 >>>>> Total memory required = 45.8 MB. >>>>> Each test is run 50 times, but only >>>>> the *best* time for each is used. >>>>> ------------------------------------------------------------- >>>>> Your clock granularity/precision appears to be 1 microseconds. >>>>> Each test below will take on the order of 2954 microseconds. >>>>> (= 2954 clock ticks) >>>>> Increase the size of the arrays if this shows that >>>>> you are not getting at least 20 clock ticks per test. >>>>> ------------------------------------------------------------- >>>>> WARNING -- The above is only a rough guideline. >>>>> For best results, please be sure you know the >>>>> precision of your system timer. >>>>> ------------------------------------------------------------- >>>>> Function Rate (MB/s) RMS time Min time Max time >>>>> Copy: 6091.9448 0.0056 0.0053 0.0061 >>>>> Scale: 5501.1775 0.0060 0.0058 0.0062 >>>>> Add: 5960.4640 0.0084 0.0081 0.0087 >>>>> Triad: 5936.2109 0.0083 0.0081 0.0089 >>>>> >>>>> I do not have OpenMP installed and so not sure if you wanted that when >>>>> you said two threads. I also closed most of the applications that were >>>>> open before running these tests and so they should hopefully be >>>>> accurate. >>>>> >>>>> Vijay >>>>> >>>>> >>>>> On Thu, Feb 3, 2011 at 1:17 PM, Barry Smith wrote: >>>>>> >>>>>> Vljay >>>>>> >>>>>> Let's just look at a single embarrassingly parallel computation in the run, this computation has NO communication and uses NO MPI and NO synchronization between processes >>>>>> >>>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>>> Event Count Time (sec) Flops --- Global --- --- Stage --- Total >>>>>> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>>> >>>>>> 1 process >>>>>> VecMAXPY 3898 1.0 1.7074e+01 1.0 3.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 29 40 0 0 0 1983 >>>>>> >>>>>> 2 processes >>>>>> VecMAXPY 3898 1.0 1.3861e+01 1.0 1.72e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 20 0 0 0 31 40 0 0 0 2443 >>>>>> >>>>>> The speed up is 1.7074e+01/1.3861e+01 = 2443./1983 = 1.23 which is terrible! Now why would it be so bad (remember you cannot blame MPI) >>>>>> >>>>>> 1) other processes are running on the machine sucking up memory bandwidth. Make sure no other compute tasks are running during this time. >>>>>> >>>>>> 2) the single process run is able to use almost all of the hardware memory bandwidth, so introducing the second process cannot increase the performance much. This means this machine is terrible for parallelization of sparse iterative solvers. >>>>>> >>>>>> 3) the machine is somehow misconfigured (beats me how) so that while the one process job doesn't use more than half of the memory bandwidth, when two processes are run the second process cannot utilize all that additional memory bandwidth. >>>>>> >>>>>> In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads). >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Feb 3, 2011, at 12:05 PM, Vijay S. Mahadevan wrote: >>>>>> >>>>>>> Matt, >>>>>>> >>>>>>> I apologize for the incomplete information. Find attached the >>>>>>> log_summary for all the cases. >>>>>>> >>>>>>> The dual quad-core system has 12 GB DDR3 SDRAM at 1333MHz with >>>>>>> 2x2GB/2x4GB configuration. I do not know how to decipher the memory >>>>>>> bandwidth with this information but if you need anything more, do let >>>>>>> me know. >>>>>>> >>>>>>> VIjay >>>>>>> >>>>>>> On Thu, Feb 3, 2011 at 11:42 AM, Matthew Knepley wrote: >>>>>>>> On Thu, Feb 3, 2011 at 11:37 AM, Vijay S. Mahadevan >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Barry, >>>>>>>>> >>>>>>>>> Sorry about the delay in the reply. I did not have access to the >>>>>>>>> system to test out what you said, until now. >>>>>>>>> >>>>>>>>> I tried with -dmmg_nlevels 5, along with the default setup: ./ex20 >>>>>>>>> -log_summary -dmmg_view -pc_type jacobi -dmmg_nlevels 5 >>>>>>>>> >>>>>>>>> processor time >>>>>>>>> 1 114.2 >>>>>>>>> 2 89.45 >>>>>>>>> 4 81.01 >>>>>>>> >>>>>>>> 1) ALWAYS ALWAYS send the full -log_summary. I cannot tell anything from >>>>>>>> this data. >>>>>>>> 2) Do you know the memory bandwidth characteristics of this machine? That is >>>>>>>> crucial and >>>>>>>> you cannot begin to understand speedup on it until you do. Please look >>>>>>>> this up. >>>>>>>> 3) Worrying about specifics of the MPI implementation makes no sense until >>>>>>>> the basics are nailed down. >>>>>>>> Matt >>>>>>>> >>>>>>>>> >>>>>>>>> The scaleup doesn't seem to be optimal, even with two processors. I am >>>>>>>>> wondering if the fault is in the MPI configuration itself. Are these >>>>>>>>> results as you would expect ? I can also send you the log_summary for >>>>>>>>> all cases if that will help. >>>>>>>>> >>>>>>>>> Vijay >>>>>>>>> >>>>>>>>> On Thu, Feb 3, 2011 at 11:10 AM, Barry Smith wrote: >>>>>>>>>> >>>>>>>>>> On Feb 2, 2011, at 11:13 PM, Vijay S. Mahadevan wrote: >>>>>>>>>> >>>>>>>>>>> Barry, >>>>>>>>>>> >>>>>>>>>>> I understand what you are saying but which example/options then is the >>>>>>>>>>> best one to compute the scalability in a multi-core machine ? I chose >>>>>>>>>>> the nonlinear diffusion problem specifically because of its inherent >>>>>>>>>>> stiffness that could lead probably provide noticeable scalability in a >>>>>>>>>>> multi-core system. From your experience, do you think there is another >>>>>>>>>>> example program that will demonstrate this much more rigorously or >>>>>>>>>>> clearly ? Btw, I dont get good speedup even for 2 processes with >>>>>>>>>>> ex20.c and that was the original motivation for this thread. >>>>>>>>>> >>>>>>>>>> Did you follow my instructions? >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Satish. I configured with --download-mpich now without the >>>>>>>>>>> mpich-device. The results are given above. I will try with the options >>>>>>>>>>> you provided although I dont entirely understand what they mean, which >>>>>>>>>>> kinda bugs me.. Also is OpenMPI the preferred implementation in Ubuntu >>>>>>>>>>> ? >>>>>>>>>>> >>>>>>>>>>> Vijay >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 2, 2011 at 6:35 PM, Barry Smith wrote: >>>>>>>>>>>> >>>>>>>>>>>> Ok, everything makes sense. Looks like you are using two level >>>>>>>>>>>> multigrid (coarse grid 20 by 20 by 20) with -mg_coarse_pc_type redundant >>>>>>>>>>>> -mg_coarse_redundant_pc_type lu This means it is solving the coarse grid >>>>>>>>>>>> problem redundantly on each process (each process is solving the entire >>>>>>>>>>>> coarse grid solve using LU factorization). The time for the factorization is >>>>>>>>>>>> (in the two process case) >>>>>>>>>>>> >>>>>>>>>>>> MatLUFactorNum 14 1.0 2.9096e+00 1.0 1.90e+09 1.0 0.0e+00 >>>>>>>>>>>> 0.0e+00 0.0e+00 37 41 0 0 0 74 82 0 0 0 1307 >>>>>>>>>>>> MatILUFactorSym 7 1.0 7.2970e-03 1.1 0.00e+00 0.0 0.0e+00 >>>>>>>>>>>> 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 2 0 >>>>>>>>>>>> >>>>>>>>>>>> which is 74 percent of the total solve time (and 84 percent of the >>>>>>>>>>>> flops). When 3/4th of the entire run is not parallel at all you cannot >>>>>>>>>>>> expect much speedup. If you run with -snes_view it will display exactly the >>>>>>>>>>>> solver being used. You cannot expect to understand the performance if you >>>>>>>>>>>> don't understand what the solver is actually doing. Using a 20 by 20 by 20 >>>>>>>>>>>> coarse grid is generally a bad idea since the code spends most of the time >>>>>>>>>>>> there, stick with something like 5 by 5 by 5. >>>>>>>>>>>> >>>>>>>>>>>> Suggest running with the default grid and -dmmg_nlevels 5 now the >>>>>>>>>>>> percent in the coarse solve will be a trivial percent of the run time. >>>>>>>>>>>> >>>>>>>>>>>> You should get pretty good speed up for 2 processes but not much >>>>>>>>>>>> better speedup for four processes because as Matt noted the computation is >>>>>>>>>>>> memory bandwidth limited; >>>>>>>>>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers. Note >>>>>>>>>>>> also that this is running multigrid which is a fast solver, but doesn't >>>>>>>>>>>> parallel scale as well many slow algorithms. For example if you run >>>>>>>>>>>> -dmmg_nlevels 5 -pc_type jacobi you will get great speed up with 2 >>>>>>>>>>>> processors but crummy speed. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Feb 2, 2011, at 6:17 PM, Vijay S. Mahadevan wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Barry, >>>>>>>>>>>>> >>>>>>>>>>>>> Please find attached the patch for the minor change to control the >>>>>>>>>>>>> number of elements from command line for snes/ex20.c. I know that >>>>>>>>>>>>> this >>>>>>>>>>>>> can be achieved with -grid_x etc from command_line but thought this >>>>>>>>>>>>> just made the typing for the refinement process a little easier. I >>>>>>>>>>>>> apologize if there was any confusion. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, find attached the full log summaries for -np=1 and -np=2. >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> Vijay >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 2, 2011 at 6:06 PM, Barry Smith >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> We need all the information from -log_summary to see what is going >>>>>>>>>>>>>> on. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Not sure what -grid 20 means but don't expect any good parallel >>>>>>>>>>>>>> performance with less than at least 10,000 unknowns per process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here's the performance statistic on 1 and 2 processor runs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 >>>>>>>>>>>>>>> -log_summary >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Max Max/Min Avg Total >>>>>>>>>>>>>>> Time (sec): 8.452e+00 1.00000 8.452e+00 >>>>>>>>>>>>>>> Objects: 1.470e+02 1.00000 1.470e+02 >>>>>>>>>>>>>>> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09 >>>>>>>>>>>>>>> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08 >>>>>>>>>>>>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>>>>>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>>>>>>>>>>>> MPI Reductions: 4.440e+02 1.00000 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 >>>>>>>>>>>>>>> -log_summary >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Max Max/Min Avg Total >>>>>>>>>>>>>>> Time (sec): 7.851e+00 1.00000 7.851e+00 >>>>>>>>>>>>>>> Objects: 2.000e+02 1.00000 2.000e+02 >>>>>>>>>>>>>>> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09 >>>>>>>>>>>>>>> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09 >>>>>>>>>>>>>>> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03 >>>>>>>>>>>>>>> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07 >>>>>>>>>>>>>>> MPI Reductions: 1.046e+03 1.00000 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am not entirely sure if I can make sense out of that statistic >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> if there is something more you need, please feel free to let me >>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Matt, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The -with-debugging=1 option is certainly not meant for >>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>> studies but I didn't expect it to yield the same cpu time as a >>>>>>>>>>>>>>>>> single >>>>>>>>>>>>>>>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors >>>>>>>>>>>>>>>>> take >>>>>>>>>>>>>>>>> approximately the same amount of time for computation of >>>>>>>>>>>>>>>>> solution. But >>>>>>>>>>>>>>>>> I am currently configuring without debugging symbols and shall >>>>>>>>>>>>>>>>> let you >>>>>>>>>>>>>>>>> know what that yields. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On a similar note, is there something extra that needs to be done >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> make use of multi-core machines while using MPI ? I am not sure >>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>> this is even related to PETSc but could be an MPI configuration >>>>>>>>>>>>>>>>> option >>>>>>>>>>>>>>>>> that maybe either I or the configure process is missing. All >>>>>>>>>>>>>>>>> ideas are >>>>>>>>>>>>>>>>> much appreciated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. >>>>>>>>>>>>>>>> On most >>>>>>>>>>>>>>>> cheap multicore machines, there is a single memory bus, and thus >>>>>>>>>>>>>>>> using more >>>>>>>>>>>>>>>> cores gains you very little extra performance. I still suspect you >>>>>>>>>>>>>>>> are not >>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>> running in parallel, because you usually see a small speedup. That >>>>>>>>>>>>>>>> is why I >>>>>>>>>>>>>>>> suggested looking at -log_summary since it tells you how many >>>>>>>>>>>>>>>> processes were >>>>>>>>>>>>>>>> run and breaks down the time. >>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am trying to configure my petsc install with an MPI >>>>>>>>>>>>>>>>>>> installation to >>>>>>>>>>>>>>>>>>> make use of a dual quad-core desktop system running Ubuntu. But >>>>>>>>>>>>>>>>>>> eventhough the configure/make process went through without >>>>>>>>>>>>>>>>>>> problems, >>>>>>>>>>>>>>>>>>> the scalability of the programs don't seem to reflect what I >>>>>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>>>>> My configure options are >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ >>>>>>>>>>>>>>>>>>> --download-mpich=1 >>>>>>>>>>>>>>>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g >>>>>>>>>>>>>>>>>>> --download-parmetis=1 --download-superlu_dist=1 >>>>>>>>>>>>>>>>>>> --download-hypre=1 >>>>>>>>>>>>>>>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++ >>>>>>>>>>>>>>>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes >>>>>>>>>>>>>>>>>>> --with-debugging=1 --with-errorchecking=yes >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1) For performance studies, make a build using >>>>>>>>>>>>>>>>>> --with-debugging=0 >>>>>>>>>>>>>>>>>> 2) Look at -log_summary for a breakdown of performance >>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there something else that needs to be done as part of the >>>>>>>>>>>>>>>>>>> configure >>>>>>>>>>>>>>>>>>> process to enable a decent scaling ? I am only comparing >>>>>>>>>>>>>>>>>>> programs with >>>>>>>>>>>>>>>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking >>>>>>>>>>>>>>>>>>> approximately the >>>>>>>>>>>>>>>>>>> same time as noted from -log_summary. If it helps, I've been >>>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>>> with snes/examples/tutorials/ex20.c for all purposes with a >>>>>>>>>>>>>>>>>>> custom >>>>>>>>>>>>>>>>>>> -grid parameter from command-line to control the number of >>>>>>>>>>>>>>>>>>> unknowns. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If there is something you've witnessed before in this >>>>>>>>>>>>>>>>>>> configuration or >>>>>>>>>>>>>>>>>>> if you need anything else to analyze the problem, do let me >>>>>>>>>>>>>>>>>>> know. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>>> lead. >>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> is infinitely more interesting than any results to which their >>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> lead. >>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their experiments >>>>>>>> is infinitely more interesting than any results to which their experiments >>>>>>>> lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>> >> >> From jed at 59A2.org Thu Feb 3 20:33:48 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 3 Feb 2011 23:33:48 -0300 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: Try telling your MPI to run each process on different sockets, or on the same socket with different caches. This is easy with Open MPI and with MPICH+Hydra. You can simply use taskset for serial jobs. On Feb 3, 2011 5:46 PM, "Barry Smith" wrote: Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth. Barry * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely. On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote: > Barry, > > The outputs are attached. I do... > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 3 20:54:58 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 3 Feb 2011 20:54:58 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: On Feb 3, 2011, at 8:33 PM, Jed Brown wrote: > Try telling your MPI to run each process on different sockets, or on the same socket with different caches. This is easy with Open MPI and with MPICH+Hydra. You can simply use taskset for serial jobs. We should add this options to the FAQ.html memory bandwidth question for everyone to easily look up. Barry > > >> On Feb 3, 2011 5:46 PM, "Barry Smith" wrote: >> >> >> Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth. >> >> Barry >> >> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely. >> >> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote: >> >> > Barry, >> > >> > The outputs are attached. I do... >> >> > >> > From vijay.m at gmail.com Thu Feb 3 21:09:45 2011 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 3 Feb 2011 21:09:45 -0600 Subject: [petsc-users] Configuring petsc with MPI on ubuntu quad-core In-Reply-To: References: <421C9DFF-35C8-47AE-9F09-BAF58D130BCF@mcs.anl.gov> <4E4DEB0D-E0AF-4712-8A92-89A00EF1DA95@mcs.anl.gov> Message-ID: I currently have it configured with mpich using --download-mpich. I have not yet tried the mpich-device option that Satish suggested. Jed, is there a configure option to include the Hydra manager during MPI install ? I can also go the OpenMPI route and install the official Ubuntu distribution to use with Petsc. On a side-note, I installed some performance monitor tools in ubuntu (http://manpages.ubuntu.com/manpages/lucid/man1/perf-stat.1.html) and ran the BasicVersion benchmark with it. Here are the logs. Performance counter stats for '/home/vijay/karma/contrib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./BasicVersion': 853.205576 task-clock-msecs # 0.996 CPUs 107 context-switches # 0.000 M/sec 1 CPU-migrations # 0.000 M/sec 12453 page-faults # 0.015 M/sec 2981125976 cycles # 3494.030 M/sec 2463421266 instructions # 0.826 IPC 33455540 cache-references # 39.212 M/sec 30304359 cache-misses # 35.518 M/sec 0.856807560 seconds time elapsed Performance counter stats for '/home/vijay/karma/contrib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./BasicVersion': 2904.477114 task-clock-msecs # 1.982 CPUs 533 context-switches # 0.000 M/sec 3 CPU-migrations # 0.000 M/sec 24728 page-faults # 0.009 M/sec 9904814141 cycles # 3410.188 M/sec 4932342066 instructions # 0.498 IPC 108666258 cache-references # 37.413 M/sec 105503187 cache-misses # 36.324 M/sec 1.465376789 seconds time elapsed There is clearly something fishy about this. Next I am going to restart the machine and try the same without the gui to see if the memory access improves without all the default background processes running. Vijay On Thu, Feb 3, 2011 at 8:54 PM, Barry Smith wrote: > > On Feb 3, 2011, at 8:33 PM, Jed Brown wrote: > >> Try telling your MPI to run each process on different sockets, or on the same socket with different caches. This is easy with Open MPI and with MPICH+Hydra. You can simply use taskset for serial jobs. > > ? We should add this options to the FAQ.html memory bandwidth question for everyone to easily look up. > > ? ?Barry > >> >> >>> On Feb 3, 2011 5:46 PM, "Barry Smith" wrote: >>> >>> >>> ? Based on these numbers (that is assuming these numbers are a correct accounting of how much memory bandwidth you can get from the system*) you essentially have a one processor machine that they sold to you as a 8 processor machine for sparse matrix computation. The one core run is using almost all the memory bandwidth, adding more cores in the computation helps very little because it is completely starved for memory bandwidth. >>> >>> ? Barry >>> >>> * perhaps something in the OS is not configured correctly and thus not allowing access to all the memory bandwidth, but this seems unlikely. >>> >>> On Feb 3, 2011, at 4:29 PM, Vijay S. Mahadevan wrote: >>> >>> > Barry, >>> > >>> > The outputs are attached. I do... >>> >>> > >>> >> > > From travis.fisher at nasa.gov Fri Feb 4 08:02:08 2011 From: travis.fisher at nasa.gov (Travis C. Fisher) Date: Fri, 4 Feb 2011 09:02:08 -0500 Subject: [petsc-users] Preconditioning in matrix free SNES Message-ID: <4D4C06E0.5070601@nasa.gov> I am trying to precondition a matrix free SNES solution. The application is a high order compressible Navier Stokes solver. My implementation is: I use the -snes_mf_operator option and create a matrix free matrix via MatCreateSNESMF. The function SNESSetJacobian points to (FormJacobian) uses MatMFFDComputeJacobian to calculate the matrix free jacobian. I then use SNESSetFromOptions and extract the linear solver and preconditioner. The linear solver is GMRES. The matrix free jacobian solve with no preconditioner works pretty well for the smooth test problem I am currently working on, but in general I don't expect it to perform that well without preconditioning. I have attempted to perform the preconditioning in two ways: 1) Simply setting the preconditioner extracted from the SNES to PCLU. The preconditioner matrix I calculate in my FormJacobian routine is a frozen Jacobian matrix, so for the first linear solve, the preconditioned operator is essentially the identity operator. The code performs an LU decomposition, but the effect of the preconditioner is a larger linear residual, so clearly something is wrong. 2) I set the preconditioner from the SNES context to PCSHELL. I then make calls to PCShellSetSetup and PCShellSetApply to assign the routines for setting up and applying the PC. The routine for PCSetUp creates a new LU preconditioner and sets the operator the frozen jacobian matrix as described above via PCSetOperators. The PCApply just calls PCApply. Again the code performs the LU decomposition, but I get the same result as above. I realize this may point to my preconditioner matrix being completely wrong, but it "looks" right when I check values. It would take a lot of effort for me to set up the coloring to have petsc calculate the jacobian via finite differences since I am using a high order stencil with full boundary closures. My question is am I obviously doing something incorrectly? Am I somehow failing to direct the SNES context to apply the LU decomposition and not assume that I have given it a preconditioner matrix to simply perform a matrix multiply? I appreciate any direction you may be able to give me. Thanks, Travis Fisher From jed at 59A2.org Fri Feb 4 08:40:47 2011 From: jed at 59A2.org (Jed Brown) Date: Fri, 4 Feb 2011 11:40:47 -0300 Subject: [petsc-users] Preconditioning in matrix free SNES In-Reply-To: <4D4C06E0.5070601@nasa.gov> References: <4D4C06E0.5070601@nasa.gov> Message-ID: On Fri, Feb 4, 2011 at 11:02, Travis C. Fisher wrote: > I am trying to precondition a matrix free SNES solution. The application is > a high order compressible Navier Stokes solver. > What sort of high-order methods? > My implementation is: > > I use the -snes_mf_operator option and create a matrix free matrix via > MatCreateSNESMF. The function SNESSetJacobian points to (FormJacobian) uses > MatMFFDComputeJacobian to calculate the matrix free jacobian. I then use > SNESSetFromOptions and extract the linear solver and preconditioner. The > linear solver is GMRES. The matrix free jacobian solve with no > preconditioner works pretty well for the smooth test problem I am currently > working on, but in general I don't expect it to perform that well without > preconditioning. I have attempted to perform the preconditioning in two > ways: > > 1) Simply setting the preconditioner extracted from the SNES to PCLU. The > preconditioner matrix I calculate in my FormJacobian routine is a frozen > Jacobian matrix, so for the first linear solve, the preconditioned operator > is essentially the identity operator. The code performs an LU decomposition, > but the effect of the preconditioner is a larger linear residual, so clearly > something is wrong. > The assembled matrix is likely incorrect, try using a tiny problem size and -snes_type test. See also SNESSetLagJacobian http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/SNES/SNESSetLagJacobian.html . I realize this may point to my preconditioner matrix being completely wrong, > but it "looks" right when I check values. > How are you creating it and how do you know what the values should "look" like? > It would take a lot of effort for me to set up the coloring to have petsc > calculate the jacobian via finite differences since I am using a high order > stencil with full boundary closures. > That's what -snes_type test is for. > My question is am I obviously doing something incorrectly? > It sounds right to me, but run with -snes_view to see what's happening. > Am I somehow failing to direct the SNES context to apply the LU > decomposition and not assume that I have given it a preconditioner matrix to > simply perform a matrix multiply? > This almost never makes sense and it would be very hard to do accidentally: http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/PC/PCMAT.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis.fisher at nasa.gov Fri Feb 4 13:16:32 2011 From: travis.fisher at nasa.gov (Travis C. Fisher) Date: Fri, 4 Feb 2011 14:16:32 -0500 Subject: [petsc-users] Preconditioning in matrix free SNES In-Reply-To: References: Message-ID: <4D4C5090.3000209@nasa.gov> Jed, Thanks for the response. I think I have resolved this particular issue. My jacobian was incorrect. I use generalized coordinates and I missed a scaling factor. I found this by generating the matrix finite difference coloring and calculating the approximate jacobian, which was much easier than I originally anticipated. The jacobian I am trying to create is exact for convective terms based on the numerical methods (ESWENO finite difference schemes). Travis From jed at 59A2.org Sat Feb 5 08:24:11 2011 From: jed at 59A2.org (Jed Brown) Date: Sat, 5 Feb 2011 09:24:11 -0500 Subject: [petsc-users] Preconditioning in matrix free SNES In-Reply-To: References: <4D4C5090.3000209@nasa.gov> Message-ID: I applaud your attention to detail if you worked out that Jacobian without AD. There has been some success preconditioning high order methods with nonlinear reconstruction using a matrix assembled with first order upwinding (sparser and less messy to work out). I works be interested to hear if that works well for you. On Feb 4, 2011 8:17 PM, "Travis C. Fisher" wrote: Jed, Thanks for the response. I think I have resolved this particular issue. My jacobian was incorrect. I use generalized coordinates and I missed a scaling factor. I found this by generating the matrix finite difference coloring and calculating the approximate jacobian, which was much easier than I originally anticipated. The jacobian I am trying to create is exact for convective terms based on the numerical methods (ESWENO finite difference schemes). Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Ellis at geosoft.com Sat Feb 5 15:54:30 2011 From: Robert.Ellis at geosoft.com (Robert Ellis) Date: Sat, 5 Feb 2011 21:54:30 +0000 Subject: [petsc-users] PC Shell Left, Right, Symm? Message-ID: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com> Hello Experts, When using a KSP Shell PreConditioner, is the LEFT, RIGHT, SYMMETRIC option applicable? call KSPGetPC(ksp,pc,ierr) call PCSetType(pc,PCSHELL,ierr) call KSPSetPreconditionerSide(ksp,PC_SYMMETRIC,ierr) call PCShellSetApply(pc,JacobiShellPCApply,ierr) The JacobiShellPCApply PreConditioner improves convergence considerably on my KSPCG problem. However, I have found empirically that using PC_LEFT, PC_RIGHT, PC_SYMMETRIC seems to have no effect on the convergence of the solution. Can anyone explain this unusual situation? Thanks in advance for any help. Cheers, Rob -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Feb 5 16:27:34 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 5 Feb 2011 16:27:34 -0600 Subject: [petsc-users] PC Shell Left, Right, Symm? In-Reply-To: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com> References: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com> Message-ID: <23409A70-AB1B-407F-9770-6C0C6690CDED@mcs.anl.gov> On Feb 5, 2011, at 3:54 PM, Robert Ellis wrote: > Hello Experts, > > When using a KSP Shell PreConditioner, is the LEFT, RIGHT, SYMMETRIC option applicable? > > call KSPGetPC(ksp,pc,ierr) > call PCSetType(pc,PCSHELL,ierr) > call KSPSetPreconditionerSide(ksp,PC_SYMMETRIC,ierr) > call PCShellSetApply(pc,JacobiShellPCApply,ierr) > > The JacobiShellPCApply PreConditioner improves convergence considerably on my KSPCG problem. However, I have found empirically that using PC_LEFT, PC_RIGHT, PC_SYMMETRIC seems to have no effect on the convergence of the solution. Can anyone explain this unusual situation? 1) Based on the call call KSPSetPreconditionerSide(ksp,PC_SYMMETRIC,ierr) above you are correctly trying to set it to use symmetric preconditioning but it must be overwritten later because our CG doesn't have symmetric (or even right preconditioning implemented) so it would generate an error message when it tries to use it. If you run with -ksp_view it will show the side being used. Perhaps the ksp in the code fragment above is not the ksp being used in your linear solve? 2) When left or right preconditioning is being used by the Krylov method the PC object doesn't know or care, it just applies the preconditioner, in this case your JacobiShellPCApply() routine. When PC_SYMMETRIC is used the application of the preconditioner is "split" into two parts, the application of the left part and the right part, symbolically as Bleft * A * Bright. This does not mean that the PCApply() is simply called twice once to apply the right part and once to apply the left part instead PCApplySymmetricRight() is called and then PCApplySymmetricLeft(). For example, if one wished to use symmetric ICC incomplete Cholesky preconditioning then these two operators are transposes of each other. Thus there should be two additional functions PCShellSetApplySymmetricRight() and PCShellSetApplySymmetricRight() allowing you to provide the two functions. PETSc doesn't currently have these but they could be trivially added. However since our Krylov methods are not even implemented for symmetric application of the preconditioner it wouldn't help you. If you use the GMRES method (just to check) you can switch the preconditioning to either side and you will see different convergence behavior. In my experience using left or right preconditioning doesn't really matter much, but there are some people who swear that one is better than the other; and different people believe different things. BTW: With PETSc's CG you can base your convergence test on either the preconditioned or nonpreconditioned residual norm this is controlled with KSPSetNormType() Barry > > Thanks in advance for any help. > Cheers, > Rob From jed at 59A2.org Sat Feb 5 16:45:57 2011 From: jed at 59A2.org (Jed Brown) Date: Sat, 5 Feb 2011 17:45:57 -0500 Subject: [petsc-users] PC Shell Left, Right, Symm? In-Reply-To: <23409A70-AB1B-407F-9770-6C0C6690CDED@mcs.anl.gov> References: <18205E5ECD2A1A4584F2BFC0BCBDE95526D34C68@exchange.geosoft.com> <23409A70-AB1B-407F-9770-6C0C6690CDED@mcs.anl.gov> Message-ID: On Sat, Feb 5, 2011 at 17:27, Barry Smith wrote: > In my experience using left or right preconditioning doesn't really matter > much, but there are some people who swear that one is better than the other; > and different people believe different things. The important difference is whether the residuals are preconditioned or not, it is rarely in the speed of convergence (in my experience as well, but see note below). Left preconditioning causes the residuals to remove poor scaling such as penalty boundary conditions before the first residual is calculated. Right preconditioning shows you the unpreconditioned residuals. If you do a convergence test with unpreconditioned residuals (right preconditioning) and penalty boundary conditions, you might need a relative tolerance of 1e-12 on the first solve since the initial iterate does not satisfy boundary conditions, but then you might do a subsequent solve with an initial iterate that satisfies the boundary conditions, in which case you only need a relative tolerance of 1e-5 (or whatever tolerance you want in the interior). This is awkward, so left preconditioning (working with preconditioned residuals) makes sense with penalty boundary conditions. If instead you do a convergence test with preconditioned residuals (left preconditioning), but the preconditioner is singular (e.g. if you apply BoomerAMG directly to a mixed-FEM discretization of incompressible Navier-Stokes), it may erroneously appear to converge despite being nowhere near converged. In this case, right preconditioning makes sense because stagnation due to a singular preconditioner is clear. My opinion is that you should choose whichever preconditioner side evaluates residuals in the fom that is most meaningful for your discretization. Note: if you solve the block system J = [A B; C D] using the exact preconditioner P1 = [A B; 0 S] where S = D-C*inv(A)*B, then right preconditioned GMRES converges in 2 iterations while left preconditioning is not guaranteed to converge in any small number of iterations (though it may still in practice). If instead you use P2=[A 0;C S], then left-preconditioned GMRES converges in 2 while right-preconditioning does not guarantee a low iteration count. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenway at utias.utoronto.ca Sat Feb 5 18:19:50 2011 From: kenway at utias.utoronto.ca (Gaetan Kenway) Date: Sat, 5 Feb 2011 19:19:50 -0500 Subject: [petsc-users] Fortran external procedure in ctx() Message-ID: Hello I'm wondering if it is possible to put an external procedure reference in a ctx() in fortran. I'm in the process of writing a Newton--Krylov solver for an aero-structural system. My two different codes are wrapped with python and so each code is called through python for residual and preconditioning operations. Nominally this would be a good use of petsc4py but it doesn't allow for PCShell so it is no use to me. I then wrote up the solver in Fortran and attempted to use callbacks to python for computing the required information. Using f2py, I can pass my two call back functions cb1 and cb2 fortran. A schematic of the code is below: subroutine solver(cb1, cb2) ! cb1 and cb2 are python callbacks set using f2py external cb1,cb2 petscFortranAddress ctx(2) ! I would like to do the following, but this doesn't compile ! ctx(1) = cb1 ! ctx(2) = cb2 call SNESCreate(comm,snes,ierr) call SNESSetFunction(snes,resVec,FormFunction,ctx,ierr) call KSPGetPC(ksp,pc,ierr) call PCSetType(pc,PCSHELL,ierr) call PCShellSetContext(pc,ctx,ierr) call PCShellSetApply(pc,applyaspc_fortran,ierr) end subroutine solver subroutine applyaspc_fortran(pc,inputVec,outputVec,ierr) PC pc Vec inputVec,outputVec PetscFortranAddress ctx(2) external func call PCShellGetContext(pc,ctx,ierr) func = ctx(2) call VecGetArrayF90(inputVec,states_in,ierr) call VecGetArrayF90(outputVec,states_out,ierr) ! Call the callback to python call func(states_in,states_out,shape(states_in)) call VecRestoreArrayF90(inputVec,states_in,ierr) call VecRestoreArrayF90(outputVec,states_out,ierr) end subroutine applyaspc_fortran In general, in Fortran, is it possible to put an external function reference in a module such that I wouldn't have to try to pass it through the application ctx? I realize this may be impossible to do in Fortran. Would such a procedure be possible in C? I'm only using Fortran since I'm much more familiar with it then with C. Sorry there isn't much to go on, but any suggestions would be greatly appreciated. Gaetan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Feb 5 19:28:52 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 5 Feb 2011 19:28:52 -0600 Subject: [petsc-users] Fortran external procedure in ctx() In-Reply-To: References: Message-ID: <6E6044B5-259F-4986-ACC2-600EF11EF529@mcs.anl.gov> On Feb 5, 2011, at 6:19 PM, Gaetan Kenway wrote: > Hello > > I'm wondering if it is possible to put an external procedure reference in a ctx() in fortran. > > I'm in the process of writing a Newton--Krylov solver for an aero-structural system. My two different codes are wrapped with python and so each code is called through python for residual and preconditioning operations. Nominally this would be a good use of petsc4py but it doesn't allow for PCShell so it is no use to me. Before making live hard by futzing around with Fortran or C lets make sure you really cannot do this in Python. What about using PCPYTHON? My guess is that this allows building your PC from pieces just like PCSHELL. Barry > > I then wrote up the solver in Fortran and attempted to use callbacks to python for computing the required information. Using f2py, I can pass my two call back functions cb1 and cb2 fortran. A schematic of the code is below: > > subroutine solver(cb1, cb2) > > ! cb1 and cb2 are python callbacks set using f2py > external cb1,cb2 > petscFortranAddress ctx(2) > > ! I would like to do the following, but this doesn't compile > ! ctx(1) = cb1 > ! ctx(2) = cb2 > > call SNESCreate(comm,snes,ierr) > call SNESSetFunction(snes,resVec,FormFunction,ctx,ierr) > > call KSPGetPC(ksp,pc,ierr) > call PCSetType(pc,PCSHELL,ierr) > > call PCShellSetContext(pc,ctx,ierr) > call PCShellSetApply(pc,applyaspc_fortran,ierr) > > end subroutine solver > > subroutine applyaspc_fortran(pc,inputVec,outputVec,ierr) > > PC pc > Vec inputVec,outputVec > PetscFortranAddress ctx(2) > external func > > call PCShellGetContext(pc,ctx,ierr) > func = ctx(2) > > call VecGetArrayF90(inputVec,states_in,ierr) > call VecGetArrayF90(outputVec,states_out,ierr) > > ! Call the callback to python > call func(states_in,states_out,shape(states_in)) > > call VecRestoreArrayF90(inputVec,states_in,ierr) > call VecRestoreArrayF90(outputVec,states_out,ierr) > end subroutine applyaspc_fortran > > > In general, in Fortran, is it possible to put an external function reference in a module such that I wouldn't have to try to pass it through the application ctx? I realize this may be impossible to do in Fortran. Would such a procedure be possible in C? I'm only using Fortran since I'm much more familiar with it then with C. > > Sorry there isn't much to go on, but any suggestions would be greatly appreciated. > > Gaetan From gaurish108 at gmail.com Sun Feb 6 02:11:35 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Sun, 6 Feb 2011 03:11:35 -0500 Subject: [petsc-users] BLAS library for PETSc Message-ID: How good is the BLAS library that PETSc downloads with the option "--download-f-blas-lapack=1 " during the installation step ? Is it recommended to use this BLAS library with PETSc or libraries such as ATLAS or GOTO? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Sun Feb 6 02:21:22 2011 From: jed at 59A2.org (Jed Brown) Date: Sun, 6 Feb 2011 03:21:22 -0500 Subject: [petsc-users] BLAS library for PETSc In-Reply-To: References: Message-ID: The reference BLAS is "bad", but unless you are doing dense linear algebra or using a third party solver that uses BLAS level 3 internally (MUMPS), a tuned implementation will make little difference because relatively little time will be spent in BLAS, and those operations are memory bound (and can only be improved a modest amount by trickery). On Feb 6, 2011 9:11 AM, "Gaurish Telang" wrote: How good is the BLAS library that PETSc downloads with the option "--download-f-blas-lapack=1 " during the installation step ? Is it recommended to use this BLAS library with PETSc or libraries such as ATLAS or GOTO? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengxwang at hotmail.com Sun Feb 6 17:00:14 2011 From: pengxwang at hotmail.com (Peter Wang) Date: Sun, 6 Feb 2011 17:00:14 -0600 Subject: [petsc-users] questions about the multigrid framework Message-ID: Hello, I have some concerns about the multigrid framework in PETSc. We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. The solver I used is a KSP solver in PETSc, which is set by calling : KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. I did some research work on the website and found the slides by Barry on http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Feb 6 21:30:56 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 6 Feb 2011 21:30:56 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: Message-ID: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > Hello, I have some concerns about the multigrid framework in PETSc. > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh. Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12 Barry We'll deal with the multigrid questions after we've resolved the more basic issues. > The solver I used is a KSP solver in PETSc, which is set by calling : > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. > > I did some research work on the website and found the slides by Barry on > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? > From ecoon at lanl.gov Mon Feb 7 10:26:21 2011 From: ecoon at lanl.gov (Ethan Coon) Date: Mon, 07 Feb 2011 09:26:21 -0700 Subject: [petsc-users] Fortran external procedure in ctx() In-Reply-To: References: Message-ID: <1297095981.2263.9.camel@echo.lanl.gov> On Sat, 2011-02-05 at 19:19 -0500, Gaetan Kenway wrote: > Hello > > I'm wondering if it is possible to put an external procedure reference > in a ctx() in fortran. > > I'm in the process of writing a Newton--Krylov solver for an > aero-structural system. My two different codes are wrapped with python > and so each code is called through python for residual and > preconditioning operations. Nominally this would be a good use of > petsc4py but it doesn't allow for PCShell so it is no use to me. Maybe it's not clear to me what you're trying to do, but I think that petsc4py can make PCShells just fine. I've attached an example in pure python which uses petsc4py to generate both Mat and PC shells to solve the saddle point problem that arises from using Lagrange Multipliers to apply boundary conditions to Laplace's equation. Both the Schur complement and the full, block matrix are stored as Mat shells, and a PC shell is used to store the PC of the full matrix [[ A^-1, 0], [0, S]] and to do the inner solve required within the MatShell for S. Ethan > > I then wrote up the solver in Fortran and attempted to use callbacks > to python for computing the required information. Using f2py, I can > pass my two call back functions cb1 and cb2 fortran. A schematic of > the code is below: > > subroutine solver(cb1, cb2) > > ! cb1 and cb2 are python callbacks set using f2py > external cb1,cb2 > petscFortranAddress ctx(2) > > ! I would like to do the following, but this doesn't compile > ! ctx(1) = cb1 > ! ctx(2) = cb2 > > call SNESCreate(comm,snes,ierr) > call SNESSetFunction(snes,resVec,FormFunction,ctx,ierr) > > call KSPGetPC(ksp,pc,ierr) > call PCSetType(pc,PCSHELL,ierr) > > call PCShellSetContext(pc,ctx,ierr) > call PCShellSetApply(pc,applyaspc_fortran,ierr) > > end subroutine solver > > subroutine applyaspc_fortran(pc,inputVec,outputVec,ierr) > > PC pc > Vec inputVec,outputVec > PetscFortranAddress ctx(2) > external func > > call PCShellGetContext(pc,ctx,ierr) > func = ctx(2) > > call VecGetArrayF90(inputVec,states_in,ierr) > call VecGetArrayF90(outputVec,states_out,ierr) > > ! Call the callback to python > call func(states_in,states_out,shape(states_in)) > > call VecRestoreArrayF90(inputVec,states_in,ierr) > call VecRestoreArrayF90(outputVec,states_out,ierr) > end subroutine applyaspc_fortran > > > In general, in Fortran, is it possible to put an external function > reference in a module such that I wouldn't have to try to pass it > through the application ctx? I realize this may be impossible to do in > Fortran. Would such a procedure be possible in C? I'm only using > Fortran since I'm much more familiar with it then with C. > > Sorry there isn't much to go on, but any suggestions would be greatly > appreciated. > > Gaetan -------------- next part -------------- A non-text attachment was scrubbed... Name: lm_solver.py Type: text/x-python Size: 8826 bytes Desc: not available URL: From pengxwang at hotmail.com Mon Feb 7 10:49:28 2011 From: pengxwang at hotmail.com (Peter Wang) Date: Mon, 7 Feb 2011 10:49:28 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> References: , <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: Thanks, Barry, I didn't run with with -ksp_monitor_true_residual -ksp_converged_reason. My own code was built based on the petsc-current/src/ksp/ksp/examples/tutorials/ex2f.F. Since the line of 248 which with KSPSetTolerances is commented out, it seems I didn't set the tolerance in my code. If I need to run with option -ksp_monitor_true_residual -ksp_converged_reason , I should add some lines like: call PetscOptionsHasName() call KSPGetConvergedReason() Am I right? In order to make the problem clear, I just attached the discription of my problem. Thanks a lot for any help from you. Following is the portion of the code with KSP solver. !============ call KSPCreate(MPI_COMM_WORLD,ksp,ierr) call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) call KSPSolve(ksp,b,x,ierr) call KSPGetIterationNumber(ksp,its,ierr) if (myid .eq. 0) then if (norm .gt. 1.e-12) then write(6,100) norm,its else write(6,110) its endif endif 100 format('Norm of error ',e10.4,' iterations ',i5) 110 format('Norm of error < 1.e-12,iterations ',i5) !============= > From: bsmith at mcs.anl.gov > Date: Sun, 6 Feb 2011 21:30:56 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] questions about the multigrid framework > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. > > > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. > > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh. > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12 > > Barry > > We'll deal with the multigrid questions after we've resolved the more basic issues. > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. > > > > I did some research work on the website and found the slides by Barry on > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? > > > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? > > > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Problem_discription.pdf Type: application/pdf Size: 95775 bytes Desc: not available URL: From jed at 59A2.org Mon Feb 7 10:53:36 2011 From: jed at 59A2.org (Jed Brown) Date: Mon, 7 Feb 2011 17:53:36 +0100 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: On Mon, Feb 7, 2011 at 17:49, Peter Wang wrote: > If I need to run with option -ksp_monitor_true_residual > -ksp_converged_reason , I should add some lines like: > call PetscOptionsHasName > () > call KSPGetConvergedReason() > Am I right? > This is insane, just be sure to call KSPSetFromOptions() and then all the options will work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.tabak at tudelft.nl Tue Feb 8 07:28:06 2011 From: u.tabak at tudelft.nl (Umut Tabak) Date: Tue, 08 Feb 2011 14:28:06 +0100 Subject: [petsc-users] Operator matrix as a Matrix-free matrix Message-ID: <4D5144E6.2090705@tudelft.nl> Dear all, I would like to create an operator matrix like (I - 0.5 C^{-1}(C+kD)) for linear iterative solvers, where k is a given scalar and C and D are matrices from a FE discretization. Moreover, the second part of the operator matrix can be constructed efficiently by using a matrix-vector product and a forward-backward substitution since I have the factorization of C which is a symmetric matrix. I guess I should use matrix free operations and create the two matrices as shell matrices such as M1 (for I) and M2 (for the rest of above) and sum them, is this the most efficient way to do this? Best wishes, Umut -- - Hope is a good thing, maybe the best of things and no good thing ever dies... The Shawshank Redemption, replique of Tim Robbins From bsmith at mcs.anl.gov Tue Feb 8 07:38:59 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 8 Feb 2011 07:38:59 -0600 Subject: [petsc-users] Operator matrix as a Matrix-free matrix In-Reply-To: <4D5144E6.2090705@tudelft.nl> References: <4D5144E6.2090705@tudelft.nl> Message-ID: <856AAEBD-892D-467A-9783-FE97757AAE8E@mcs.anl.gov> On Feb 8, 2011, at 7:28 AM, Umut Tabak wrote: > Dear all, > > I would like to create an operator matrix like > > (I - 0.5 C^{-1}(C+kD)) > > for linear iterative solvers, where k is a given scalar and C and D are matrices from a FE discretization. > > Moreover, the second part of the operator matrix can be constructed efficiently by using a matrix-vector product and a forward-backward substitution since I have the factorization of C which is a symmetric matrix. > > I guess I should use matrix free operations and create the two matrices as shell matrices such as M1 (for I) and M2 (for the rest of above) and sum them, is this the most efficient way to do this? > I would make a single MATSHELL, inside it I would store the k, the matrix C and the matrix D, in addition I would store in it a KSP object where I have called KSPSetOperators() with the C matrix. Then the PCApply for the shell matrix could be .5*( I - k* kspsolve(C)*D) if I have my math correct. No reason that I can see for having more than one shell matrix. Barry > Best wishes, > Umut > > -- > - Hope is a good thing, maybe the best of things > and no good thing ever dies... > The Shawshank Redemption, replique of Tim Robbins > From klaus.zimmermann at physik.uni-freiburg.de Tue Feb 8 10:16:10 2011 From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann) Date: Tue, 08 Feb 2011 17:16:10 +0100 Subject: [petsc-users] Howto evaluate function on grid Message-ID: <4D516C4A.6040207@physik.uni-freiburg.de> Dear all, I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message). I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the program to higher dimensions and also unstructured grids. As I understand it now there are several candidates in PETSC for doing this: 1) PF 2) DALocalFunction 3) FIAT (?) Could you please advise on what should be used? An additional problem is that several distinct functions should be evaluated at the same time due to the due to the reuse of intermediate results. Any help is appreciated! Thanks in advance, Klaus ------------------8<------------------8<------------------8<------------------8<------------------8<-------------- PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) { PetscErrorCode ierr; PetscInt i, j, cxs, cxm, cys, cym; PetscScalar **lpsi; PetscScalar **lGradPsi_x, **lGradPsi_y; Vec gc; DACoor2d **coors; DA cda; ierr = DAGetCoordinateDA(user->zGrid, &cda);CHKERRQ(ierr); ierr = DAGetCoordinates(user->zGrid, &gc);CHKERRQ(ierr); ierr = DAVecGetArray(cda, gc, &coors);CHKERRQ(ierr); ierr = DAGetCorners(cda, &cxs, &cys, PETSC_NULL, &cxm, &cym, PETSC_NULL);CHKERRQ(ierr); ierr = DAVecGetArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr); ierr = DAVecGetArray(user->zGrid, user->gradPsi_x, &lGradPsi_x);CHKERRQ(ierr); ierr = DAVecGetArray(user->zGrid, user->gradPsi_y, &lGradPsi_y);CHKERRQ(ierr); for(i=cys; i=0) { ierr = EvaluatePsiAndGradPsi(user, x, y, &(lpsi[i][j]), &(lGradPsi_x[i][j]), &(lGradPsi_y[i][j]));CHKERRQ(ierr); } } } ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x, &lGradPsi_x);CHKERRQ(ierr); ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y, &lGradPsi_y);CHKERRQ(ierr); ierr = DAVecRestoreArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr); ierr = DAVecRestoreArray(cda, gc, &coors);CHKERRQ(ierr); ierr = VecDestroy(gc);CHKERRQ(ierr); ierr = DADestroy(cda);CHKERRQ(ierr); return 0; } From bsmith at mcs.anl.gov Tue Feb 8 10:29:48 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 8 Feb 2011 10:29:48 -0600 Subject: [petsc-users] Howto evaluate function on grid In-Reply-To: <4D516C4A.6040207@physik.uni-freiburg.de> References: <4D516C4A.6040207@physik.uni-freiburg.de> Message-ID: On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote: > Dear all, > > I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message). > I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the > program to higher dimensions and also unstructured grids. There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them. > As I understand it now there are several > candidates in PETSC for doing this: > > 1) PF > 2) DALocalFunction Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES() > 3) FIAT (?) > > Could you please advise on what should be used? > An additional problem is that several distinct functions should be evaluated at the same time due to the > due to the reuse of intermediate results. Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions. Barry > > Any help is appreciated! > Thanks in advance, > Klaus > > ------------------8<------------------8<------------------8<------------------8<------------------8<-------------- > > PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) { > PetscErrorCode ierr; > PetscInt i, j, cxs, cxm, cys, cym; > PetscScalar **lpsi; > PetscScalar **lGradPsi_x, **lGradPsi_y; > Vec gc; > DACoor2d **coors; > DA cda; > ierr = DAGetCoordinateDA(user->zGrid, &cda);CHKERRQ(ierr); > ierr = DAGetCoordinates(user->zGrid, &gc);CHKERRQ(ierr); > ierr = DAVecGetArray(cda, gc, &coors);CHKERRQ(ierr); > ierr = DAGetCorners(cda, &cxs, &cys, PETSC_NULL, &cxm, &cym, PETSC_NULL);CHKERRQ(ierr); > > ierr = DAVecGetArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr); > ierr = DAVecGetArray(user->zGrid, user->gradPsi_x, &lGradPsi_x);CHKERRQ(ierr); > ierr = DAVecGetArray(user->zGrid, user->gradPsi_y, &lGradPsi_y);CHKERRQ(ierr); > for(i=cys; i for(j=cxs; j PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y), > y = PetscRealPart(coors[i][j].y); > if(x>=0) { > ierr = EvaluatePsiAndGradPsi(user, x, y, > &(lpsi[i][j]), > &(lGradPsi_x[i][j]), > &(lGradPsi_y[i][j]));CHKERRQ(ierr); > } > } > } > ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x, &lGradPsi_x);CHKERRQ(ierr); > ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y, &lGradPsi_y);CHKERRQ(ierr); > ierr = DAVecRestoreArray(user->zGrid, user->psi, &lpsi);CHKERRQ(ierr); > ierr = DAVecRestoreArray(cda, gc, &coors);CHKERRQ(ierr); > ierr = VecDestroy(gc);CHKERRQ(ierr); > ierr = DADestroy(cda);CHKERRQ(ierr); > return 0; > } From klaus.zimmermann at physik.uni-freiburg.de Tue Feb 8 10:52:01 2011 From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann) Date: Tue, 08 Feb 2011 17:52:01 +0100 Subject: [petsc-users] Howto evaluate function on grid In-Reply-To: References: <4D516C4A.6040207@physik.uni-freiburg.de> Message-ID: <4D5174B1.8070601@physik.uni-freiburg.de> Hi Barry, thanks for your response. I guess my code excerpt wasn't very good, nor was my description. My apologies. The evaluation in more details goes like this: Depending on the coordinates x and y (and only on them) I calculate 4 vectors S1,...,S4. I have a constant matrix C in the appctx. The three quantities I am interested in are then: 1) u1 = (x+y)*VecTDot(S1, MatMult(C, S2)) 2) u2 = u1/(x+y) + (x+y)*VecTDot(S3, MatMult(C, S2)) 3) u3 = u1/(x+y) + (x+y)*VecTDot(S1, MatMult(C, S4)) With this (hopefully better) description let me answer to your points below individually. On 02/08/2011 05:29 PM, Barry Smith wrote: > On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote: > >> Dear all, >> >> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message). >> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the >> program to higher dimensions and also unstructured grids. > There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them. This is why I hoped to reuse code for the unstructured version: As long as I can call a method with coordinates and context I am fine. I don't really need any solving. Do I still need different FormFunctions? >> As I understand it now there are several >> candidates in PETSC for doing this: >> >> 1) PF >> 2) DALocalFunction > Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES() I agree. This is mostly because I didn't understand the concepts so well at the time I wrote this code and one of the reasons why I would like to refactor. In my case there should in principle be three output vectors. All the facilities I have seen in petsc only deal with a single output vector. Is this correct? Of course there is an obvious mapping, but I would prefer to keep the vectors apart because that way it is easier to deal with the parallel layout. >> 3) FIAT (?) >> >> Could you please advise on what should be used? >> An additional problem is that several distinct functions should be evaluated at the same time due to the >> due to the reuse of intermediate results. > > Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions. I am not really using any solving at the moment. Please let me know if you need more detail. Thanks again, Klaus > Barry > >> Any help is appreciated! >> Thanks in advance, >> Klaus >> >> ------------------8<------------------8<------------------8<------------------8<------------------8<-------------- >> >> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) { >> PetscErrorCode ierr; >> PetscInt i, j, cxs, cxm, cys, cym; >> PetscScalar **lpsi; >> PetscScalar **lGradPsi_x, **lGradPsi_y; >> Vec gc; >> DACoor2d **coors; >> DA cda; >> ierr = DAGetCoordinateDA(user->zGrid,&cda);CHKERRQ(ierr); >> ierr = DAGetCoordinates(user->zGrid,&gc);CHKERRQ(ierr); >> ierr = DAVecGetArray(cda, gc,&coors);CHKERRQ(ierr); >> ierr = DAGetCorners(cda,&cxs,&cys, PETSC_NULL,&cxm,&cym, PETSC_NULL);CHKERRQ(ierr); >> >> ierr = DAVecGetArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr); >> ierr = DAVecGetArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr); >> ierr = DAVecGetArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr); >> for(i=cys; i> for(j=cxs; j> PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y), >> y = PetscRealPart(coors[i][j].y); >> if(x>=0) { >> ierr = EvaluatePsiAndGradPsi(user, x, y, >> &(lpsi[i][j]), >> &(lGradPsi_x[i][j]), >> &(lGradPsi_y[i][j]));CHKERRQ(ierr); >> } >> } >> } >> ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr); >> ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr); >> ierr = DAVecRestoreArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr); >> ierr = DAVecRestoreArray(cda, gc,&coors);CHKERRQ(ierr); >> ierr = VecDestroy(gc);CHKERRQ(ierr); >> ierr = DADestroy(cda);CHKERRQ(ierr); >> return 0; >> } From bsmith at mcs.anl.gov Tue Feb 8 11:00:41 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 8 Feb 2011 11:00:41 -0600 Subject: [petsc-users] Howto evaluate function on grid In-Reply-To: <4D5174B1.8070601@physik.uni-freiburg.de> References: <4D516C4A.6040207@physik.uni-freiburg.de> <4D5174B1.8070601@physik.uni-freiburg.de> Message-ID: <8B8D3CD3-9970-4838-8C33-0FABE332A8C8@mcs.anl.gov> On Feb 8, 2011, at 10:52 AM, Klaus Zimmermann wrote: > Hi Barry, > > thanks for your response. I guess my code excerpt wasn't very good, nor was my description. My apologies. > > The evaluation in more details goes like this: > Depending on the coordinates x and y (and only on them) I calculate 4 vectors S1,...,S4. > I have a constant matrix C in the appctx. The three quantities I am interested in are then: > > 1) u1 = (x+y)*VecTDot(S1, MatMult(C, S2)) > 2) u2 = u1/(x+y) + (x+y)*VecTDot(S3, MatMult(C, S2)) > 3) u3 = u1/(x+y) + (x+y)*VecTDot(S1, MatMult(C, S4)) > How big are S? > With this (hopefully better) description let me answer to your points below individually. > On 02/08/2011 05:29 PM, Barry Smith wrote: >> On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote: >> >>> Dear all, >>> >>> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message). >>> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the >>> program to higher dimensions and also unstructured grids. >> There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them. > This is why I hoped to reuse code for the unstructured version: As long as I can call a method with coordinates and context I am fine. > I don't really need any solving. Do I still need different FormFunctions? NO > >>> As I understand it now there are several >>> candidates in PETSC for doing this: >>> >>> 1) PF >>> 2) DALocalFunction >> Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES() > I agree. This is mostly because I didn't understand the concepts so well at the time I wrote this code and one of the reasons why I would like to refactor. > In my case there should in principle be three output vectors. All the facilities I have seen in petsc only deal with a single output vector. Is this correct? > Of course there is an obvious mapping, but I would prefer to keep the vectors apart because that way it is easier to deal with the parallel layout. You can keep them separate. You can have as many vector inputs and outputs you want (it is only the SNES solvers that need exactly one input and one output). Sometimes storing several vectors "interlaced" gives better performance since it uses the cache's better but that is only an optimization. If you separate the "iterater" part of the code from the "function" part then you can have a different iterator for structured and unstructured grid but reuse the "function" part. > >>> 3) FIAT (?) >>> >>> Could you please advise on what should be used? >>> An additional problem is that several distinct functions should be evaluated at the same time due to the >>> due to the reuse of intermediate results. >> >> Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions. > I am not really using any solving at the moment. Please let me know if you need more detail. > > Thanks again, > Klaus > >> Barry >> >>> Any help is appreciated! >>> Thanks in advance, >>> Klaus >>> >>> ------------------8<------------------8<------------------8<------------------8<------------------8<-------------- >>> >>> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) { >>> PetscErrorCode ierr; >>> PetscInt i, j, cxs, cxm, cys, cym; >>> PetscScalar **lpsi; >>> PetscScalar **lGradPsi_x, **lGradPsi_y; >>> Vec gc; >>> DACoor2d **coors; >>> DA cda; >>> ierr = DAGetCoordinateDA(user->zGrid,&cda);CHKERRQ(ierr); >>> ierr = DAGetCoordinates(user->zGrid,&gc);CHKERRQ(ierr); >>> ierr = DAVecGetArray(cda, gc,&coors);CHKERRQ(ierr); >>> ierr = DAGetCorners(cda,&cxs,&cys, PETSC_NULL,&cxm,&cym, PETSC_NULL);CHKERRQ(ierr); >>> >>> ierr = DAVecGetArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr); >>> ierr = DAVecGetArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr); >>> ierr = DAVecGetArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr); >>> for(i=cys; i>> for(j=cxs; j>> PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y), >>> y = PetscRealPart(coors[i][j].y); >>> if(x>=0) { >>> ierr = EvaluatePsiAndGradPsi(user, x, y, >>> &(lpsi[i][j]), >>> &(lGradPsi_x[i][j]), >>> &(lGradPsi_y[i][j]));CHKERRQ(ierr); >>> } >>> } >>> } >>> ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr); >>> ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr); >>> ierr = DAVecRestoreArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr); >>> ierr = DAVecRestoreArray(cda, gc,&coors);CHKERRQ(ierr); >>> ierr = VecDestroy(gc);CHKERRQ(ierr); >>> ierr = DADestroy(cda);CHKERRQ(ierr); >>> return 0; >>> } > From jed at 59A2.org Tue Feb 8 11:03:20 2011 From: jed at 59A2.org (Jed Brown) Date: Tue, 8 Feb 2011 18:03:20 +0100 Subject: [petsc-users] Howto evaluate function on grid In-Reply-To: <4D5174B1.8070601@physik.uni-freiburg.de> References: <4D516C4A.6040207@physik.uni-freiburg.de> <4D5174B1.8070601@physik.uni-freiburg.de> Message-ID: On Tue, Feb 8, 2011 at 17:52, Klaus Zimmermann < klaus.zimmermann at physik.uni-freiburg.de> wrote: > I agree. This is mostly because I didn't understand the concepts so well at > the time I wrote this code and one of the reasons why I would like to > refactor. > In my case there should in principle be three output vectors. All the > facilities I have seen in petsc only deal with a single output vector. Is > this correct? > Of course there is an obvious mapping, but I would prefer to keep the > vectors apart because that way it is easier to deal with the parallel > layout. > Packing them together will give you better memory performance. You can extract separate pieces with the VecStride functions if you need it separate. If you have a really good reason for storing them separately, petsc-dev has VecNest which lets you treat several vectors as one, but some operations are more expensive and I would not recommend using it for your purposes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From klaus.zimmermann at physik.uni-freiburg.de Tue Feb 8 11:15:48 2011 From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann) Date: Tue, 08 Feb 2011 18:15:48 +0100 Subject: [petsc-users] Howto evaluate function on grid In-Reply-To: <8B8D3CD3-9970-4838-8C33-0FABE332A8C8@mcs.anl.gov> References: <4D516C4A.6040207@physik.uni-freiburg.de> <4D5174B1.8070601@physik.uni-freiburg.de> <8B8D3CD3-9970-4838-8C33-0FABE332A8C8@mcs.anl.gov> Message-ID: <4D517A44.8070502@physik.uni-freiburg.de> On 02/08/2011 06:00 PM, Barry Smith wrote: > On Feb 8, 2011, at 10:52 AM, Klaus Zimmermann wrote: > >> Hi Barry, >> >> thanks for your response. I guess my code excerpt wasn't very good, nor was my description. My apologies. >> >> The evaluation in more details goes like this: >> Depending on the coordinates x and y (and only on them) I calculate 4 vectors S1,...,S4. >> I have a constant matrix C in the appctx. The three quantities I am interested in are then: >> >> 1) u1 = (x+y)*VecTDot(S1, MatMult(C, S2)) >> 2) u2 = u1/(x+y) + (x+y)*VecTDot(S3, MatMult(C, S2)) >> 3) u3 = u1/(x+y) + (x+y)*VecTDot(S1, MatMult(C, S4)) >> > How big are S? Depending on parameter from 100 to 1000. Also C is dense. >> With this (hopefully better) description let me answer to your points below individually. >> On 02/08/2011 05:29 PM, Barry Smith wrote: >>> On Feb 8, 2011, at 10:16 AM, Klaus Zimmermann wrote: >>> >>>> Dear all, >>>> >>>> I want to evaluate a function on a grid. Right now I have some custom code (see the end of this message). >>>> I am now thinking of rewriting this using more PETSC facilities because in the future we want to extend the >>>> program to higher dimensions and also unstructured grids. >>> There is little similarity and little chance of code reuse between using a structured grid and an unstructured grid at the level of your function evaluations. So if you truly want to have a structured grid version and unstructured you should just have different FormFunctions for them. >> This is why I hoped to reuse code for the unstructured version: As long as I can call a method with coordinates and context I am fine. >> I don't really need any solving. Do I still need different FormFunctions? > NO Ok. >>>> As I understand it now there are several >>>> candidates in PETSC for doing this: >>>> >>>> 1) PF >>>> 2) DALocalFunction >>> Using the "local" function approach with DMMGSetSNESLocal() is just a way of hiding the DAVecGetArray() calls from the function code and is good when you have simple FormFunctions that do no rely on other vectors beside the usual input and output vectors. In your example below I do not understand why you have the input and output vectors inside the appctx, usually they are the input and output arguments to the formfunction set with SNESSetFunction() or DMMGSetSNES() >> I agree. This is mostly because I didn't understand the concepts so well at the time I wrote this code and one of the reasons why I would like to refactor. >> In my case there should in principle be three output vectors. All the facilities I have seen in petsc only deal with a single output vector. Is this correct? >> Of course there is an obvious mapping, but I would prefer to keep the vectors apart because that way it is easier to deal with the parallel layout. > You can keep them separate. You can have as many vector inputs and outputs you want (it is only the SNES solvers that need exactly one input and one output). Sometimes storing several vectors "interlaced" gives > better performance since it uses the cache's better but that is only an optimization. > > If you separate the "iterater" part of the code from the "function" part then you can have a different iterator for structured and unstructured grid but reuse the "function" part. So is there some general iterator code I could use? With regards to the vector layout: After the evaluation I want to calculate quantities like PointwiseMult(VecConjugate(u1),u2). I thought that for this it would be advantageous to have the output vectors layed out in the same way. Do you think the interleaved layout works as well? >>>> 3) FIAT (?) >>>> >>>> Could you please advise on what should be used? >>>> An additional problem is that several distinct functions should be evaluated at the same time due to the >>>> due to the reuse of intermediate results. >>> Are these functions associated with different SNES solvers or is the composition of these "several distinct functions" that defines the equations you are solving with PETSc? If the later then you just need to write either a single function that computes everything or have some smart use of inline to get good performance even with different functions. >> I am not really using any solving at the moment. Please let me know if you need more detail. >> >> Thanks again, >> Klaus >> >>> Barry >>> >>>> Any help is appreciated! >>>> Thanks in advance, >>>> Klaus >>>> >>>> ------------------8<------------------8<------------------8<------------------8<------------------8<-------------- >>>> >>>> PetscErrorCode EvaluatePsiAndGradPsi(AppCtx *user) { >>>> PetscErrorCode ierr; >>>> PetscInt i, j, cxs, cxm, cys, cym; >>>> PetscScalar **lpsi; >>>> PetscScalar **lGradPsi_x, **lGradPsi_y; >>>> Vec gc; >>>> DACoor2d **coors; >>>> DA cda; >>>> ierr = DAGetCoordinateDA(user->zGrid,&cda);CHKERRQ(ierr); >>>> ierr = DAGetCoordinates(user->zGrid,&gc);CHKERRQ(ierr); >>>> ierr = DAVecGetArray(cda, gc,&coors);CHKERRQ(ierr); >>>> ierr = DAGetCorners(cda,&cxs,&cys, PETSC_NULL,&cxm,&cym, PETSC_NULL);CHKERRQ(ierr); >>>> >>>> ierr = DAVecGetArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr); >>>> ierr = DAVecGetArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr); >>>> ierr = DAVecGetArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr); >>>> for(i=cys; i>>> for(j=cxs; j>>> PetscReal x = PetscRealPart(coors[i][j].x-coors[i][j].y), >>>> y = PetscRealPart(coors[i][j].y); >>>> if(x>=0) { >>>> ierr = EvaluatePsiAndGradPsi(user, x, y, >>>> &(lpsi[i][j]), >>>> &(lGradPsi_x[i][j]), >>>> &(lGradPsi_y[i][j]));CHKERRQ(ierr); >>>> } >>>> } >>>> } >>>> ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_x,&lGradPsi_x);CHKERRQ(ierr); >>>> ierr = DAVecRestoreArray(user->zGrid, user->gradPsi_y,&lGradPsi_y);CHKERRQ(ierr); >>>> ierr = DAVecRestoreArray(user->zGrid, user->psi,&lpsi);CHKERRQ(ierr); >>>> ierr = DAVecRestoreArray(cda, gc,&coors);CHKERRQ(ierr); >>>> ierr = VecDestroy(gc);CHKERRQ(ierr); >>>> ierr = DADestroy(cda);CHKERRQ(ierr); >>>> return 0; >>>> } From klaus.zimmermann at physik.uni-freiburg.de Tue Feb 8 11:17:43 2011 From: klaus.zimmermann at physik.uni-freiburg.de (Klaus Zimmermann) Date: Tue, 08 Feb 2011 18:17:43 +0100 Subject: [petsc-users] Howto evaluate function on grid In-Reply-To: References: <4D516C4A.6040207@physik.uni-freiburg.de> <4D5174B1.8070601@physik.uni-freiburg.de> Message-ID: <4D517AB7.6060205@physik.uni-freiburg.de> On 02/08/2011 06:03 PM, Jed Brown wrote: > On Tue, Feb 8, 2011 at 17:52, Klaus Zimmermann > > wrote: > > I agree. This is mostly because I didn't understand the concepts > so well at the time I wrote this code and one of the reasons why I > would like to refactor. > In my case there should in principle be three output vectors. All > the facilities I have seen in petsc only deal with a single output > vector. Is this correct? > Of course there is an obvious mapping, but I would prefer to keep > the vectors apart because that way it is easier to deal with the > parallel layout. > > > Packing them together will give you better memory performance. You can > extract separate pieces with the VecStride functions if you need it > separate. If you have a really good reason for storing them > separately, petsc-dev has VecNest which lets you treat several vectors > as one, but some operations are more expensive and I would not > recommend using it for your purposes. Thanks for the info. I guess I'll have them interleaved then and extract the components for the global calculations afterwards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengxwang at hotmail.com Wed Feb 9 09:58:34 2011 From: pengxwang at hotmail.com (Peter Wang) Date: Wed, 9 Feb 2011 09:58:34 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> References: , <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: Thanks Barry, I run the code with -ksp_monitor_true_residual -ksp_converged_reason, and it turns out that the computation didn't get the real convergence. After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07 IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07. and it didn't converge yet. I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help. > From: bsmith at mcs.anl.gov > Date: Sun, 6 Feb 2011 21:30:56 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] questions about the multigrid framework > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. > > > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. > > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh. > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12 > > Barry > > We'll deal with the multigrid questions after we've resolved the more basic issues. > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. > > > > I did some research work on the website and found the slides by Barry on > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? > > > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? > > > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 9 10:00:37 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 9 Feb 2011 10:00:37 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: On Wed, Feb 9, 2011 at 9:58 AM, Peter Wang wrote: > Thanks Barry, > > I run the code with -ksp_monitor_true_residual -ksp_converged_reason, > and it turns out that the computation didn't get the real convergence. > After I set the rtol and more iteration, the numerical solution get better. > However, the computation converges very slowly with finer grid points. For > example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution > varys mainly in y direction) > at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| > 9.159199925235e-07 > IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| > 9.159174196917e-07. > and it didn't converge yet. > > I am wondering if the solver is changed, the convergency speed could get > fater? Or, I should take anohte approach to use finer grids, like multigrid? > Thanks for your help. > If you can get MG to work for your problem, its optimal. All the Krylov methods alone will get worse with increasing grid size. Matt > > > From: bsmith at mcs.anl.gov > > Date: Sun, 6 Feb 2011 21:30:56 -0600 > > To: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] questions about the multigrid framework > > > > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > > > We are trying to solve a two dimensional problem with a large variety > in length scales. The length of computational domain is in order of 1e3 m, > and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in > a corner of the domain. > > > > > > As a first thinking, we tried to solve the problem with a larger number > of uniform or non-uniform grids. However, the error of the numerical > solution increases when the number of the grid is too large. In order to > test the effect of the grid size on the solution, a domain with regular > scale of 1m by 1m was tried to solve. It is found that the extreme small > grid size might lead to large variation to the exact solution. For example, > the exact solution is a linear distribution in the domain. The numerical > solution is linear as similar as the exact solution when the grid number is > nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the > numerical solution varies to nonlinear distribution which boundary is the > only same as the exact solution. > > > > Stop right here. 99.9% of the time what you describe should not happen, > with a finer grid your solution (for a problem with a known solution for > example) will be more accurate and won't suddenly get less accurate with a > finer mesh. > > > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to > make sure that it is converging? and using a smaller -ksp_rtol for > more grid points. For example with 10,000 grid points in each direction and > no better idea of what the discretization error is I would use a tol of > 1.e-12 > > > > Barry > > > > We'll deal with the multigrid questions after we've resolved the more > basic issues. > > > > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this > solver is not suitable to the system with small size grid? Or, whether the > problem crossing 6 orders of length scale is solvable with only one level > grid system when the memory is enough for large matrix? Since there is less > coding work for one level grid size, it would be easy to implement the > solver. > > > > > > I did some research work on the website and found the slides by Barry > on > > > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > > It seems that the multigrid framework in PETSc is a possible approach > to our problem. We are thinking to turn to the multigrid framework in PETSc > to solve the problem. However, before we dig into it, there are some issues > confusing us. It would be great if we can get any suggestion from you: > > > 1 Whether the multigrid framework can handle the problem with a large > variety in length scales (up to 6 orders)? Is DMMG is the best tool for our > problem? > > > > > > 2 The coefficient matrix A and the right hand side vector b were > created for the finite difference scheme of the domain and solved by KSP > solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it > easy to immigrate the created Matrix A and Vector b to the multigrid > framework? > > > > > > 3 How many levels of the subgrid are needed to obtain a solution close > enough to the exact solution for a problem with 6 orders in length scale? > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From adube1 at tigers.lsu.edu Wed Feb 9 10:10:48 2011 From: adube1 at tigers.lsu.edu (Anuj Dube) Date: Wed, 9 Feb 2011 10:10:48 -0600 Subject: [petsc-users] Regarding Installation Message-ID: Dear Sir/ Madam I have downloaded the zip folder but it seems that that the setup file is .py and I do not have python on my system. Is there a way to install PETSc to my windows system without downloading Python? -- Anuj Dube Dept. of Computer Science Louisiana State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 9 10:31:37 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 9 Feb 2011 10:31:37 -0600 Subject: [petsc-users] Regarding Installation In-Reply-To: References: Message-ID: On Wed, Feb 9, 2011 at 10:10 AM, Anuj Dube wrote: > Dear Sir/ Madam > > I have downloaded the zip folder but it seems that that the setup file is > .py and I do not have python on my system. Is there a way to install PETSc > to my windows system without downloading Python? No. The configuration system uses Python and the build system uses Cygwin (which has Python). Matt > > -- > > > Anuj Dube > Dept. of Computer Science > Louisiana State University > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Feb 9 10:44:24 2011 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 9 Feb 2011 10:44:24 -0600 (CST) Subject: [petsc-users] Regarding Installation In-Reply-To: References: Message-ID: On Wed, 9 Feb 2011, Matthew Knepley wrote: > On Wed, Feb 9, 2011 at 10:10 AM, Anuj Dube wrote: > > > Dear Sir/ Madam > > > > I have downloaded the zip folder but it seems that that the setup file is > > .py and I do not have python on my system. Is there a way to install PETSc > > to my windows system without downloading Python? > > > No. The configuration system uses Python and the build system uses Cygwin > (which has Python). i.e you need cygwin-python and other cygwin tools - not regular/win-python check the installation instructions. Satish From pengxwang at hotmail.com Wed Feb 9 10:44:24 2011 From: pengxwang at hotmail.com (Peter Wang) Date: Wed, 9 Feb 2011 10:44:24 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: , <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>, , Message-ID: Thank, Matt, Did you mean All the Krylov methods alone will get worse with increasing grid number? Since the finer grid has smaller size and more number of grid. Since I am a new user of PETSc, the easiest way for me is still keep in KSP solver. However, if the solver cannot satisfy the speed reqirement. I am thinking to use MG method. However, I don't have any experience on multigrid. Could you please give me some suggestion on it? 1, Since I have built the Matrix and the vector for finite difference scheme in KSP solver, where should I start from to transfer to multigrid? I studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good prototype to be based on to create my own code? Is DMMG is the best tool for my problem? 2, How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? 3, The procedure of building Matrix and RHS vector in MG method is to build the matrix and RHS in the finest grid level and the MG will start the computation from the coarsest level, right? Thanks for your considerate reponse. Date: Wed, 9 Feb 2011 10:00:37 -0600 From: knepley at gmail.com To: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] questions about the multigrid framework On Wed, Feb 9, 2011 at 9:58 AM, Peter Wang wrote: Thanks Barry, I run the code with -ksp_monitor_true_residual -ksp_converged_reason, and it turns out that the computation didn't get the real convergence. After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07 IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07. and it didn't converge yet. I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help. If you can get MG to work for your problem, its optimal. All the Krylov methods alone will get worse with increasing grid size. Matt > From: bsmith at mcs.anl.gov > Date: Sun, 6 Feb 2011 21:30:56 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] questions about the multigrid framework > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. > > > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. > > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh. > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12 > > Barry > > We'll deal with the multigrid questions after we've resolved the more basic issues. > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. > > > > I did some research work on the website and found the slides by Barry on > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? > > > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? > > > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 9 10:53:00 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 9 Feb 2011 10:53:00 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: On Wed, Feb 9, 2011 at 10:44 AM, Peter Wang wrote: > Thank, Matt, > > Did you mean All the Krylov methods alone will get worse with increasing > grid number? Since the finer grid has smaller size and more number of grid. > > Since I am a new user of PETSc, the easiest way for me is still keep in > KSP solver. However, if the solver cannot satisfy the speed reqirement. I am > thinking to use MG method. However, I don't have any experience > on multigrid. Could you please give me some suggestion on it? > The best thing to do is get a book about solvers and preconditioners. All your questions depend on what type of operator you are trying to invert. I recommend Saad for Iteravtive Methods and maybe Briggs for an intro to MG. Barry's book has a good overview of Domain Decomposition. Thanks, Matt > > 1, Since I have built the Matrix and the vector for finite difference > scheme in KSP solver, where should I start from to transfer to multigrid? I > studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good > prototype to be based on to create my own code? Is DMMG is the best tool > for my problem? > > > 2, How many levels of the subgrid are needed to obtain a solution close > enough to the exact solution for a problem with 6 orders in length scale? > > 3, The procedure of building Matrix and RHS vector in MG method is to > build the matrix and RHS in the finest grid level and the MG will start the > computation from the coarsest level, right? > > Thanks for your considerate reponse. > > > > > ------------------------------ > Date: Wed, 9 Feb 2011 10:00:37 -0600 > From: knepley at gmail.com > > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] questions about the multigrid framework > > On Wed, Feb 9, 2011 at 9:58 AM, Peter Wang wrote: > > Thanks Barry, > > I run the code with -ksp_monitor_true_residual -ksp_converged_reason, > and it turns out that the computation didn't get the real convergence. > After I set the rtol and more iteration, the numerical solution get better. > However, the computation converges very slowly with finer grid points. For > example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution > varys mainly in y direction) > at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| > 9.159199925235e-07 > IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| > 9.159174196917e-07. > and it didn't converge yet. > > I am wondering if the solver is changed, the convergency speed could get > fater? Or, I should take anohte approach to use finer grids, like multigrid? > Thanks for your help. > > > If you can get MG to work for your problem, its optimal. All the Krylov > methods alone will get worse with increasing grid size. > > Matt > > > > > From: bsmith at mcs.anl.gov > > Date: Sun, 6 Feb 2011 21:30:56 -0600 > > To: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] questions about the multigrid framework > > > > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > > > We are trying to solve a two dimensional problem with a large variety > in length scales. The length of computational domain is in order of 1e3 m, > and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in > a corner of the domain. > > > > > > As a first thinking, we tried to solve the problem with a larger number > of uniform or non-uniform grids. However, the error of the numerical > solution increases when the number of the grid is too large. In order to > test the effect of the grid size on the solution, a domain with regular > scale of 1m by 1m was tried to solve. It is found that the extreme small > grid size might lead to large variation to the exact solution. For example, > the exact solution is a linear distribution in the domain. The numerical > solution is linear as similar as the exact solution when the grid number is > nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the > numerical solution varies to nonlinear distribution which boundary is the > only same as the exact solution. > > > > Stop right here. 99.9% of the time what you describe should not happen, > with a finer grid your solution (for a problem with a known solution for > example) will be more accurate and won't suddenly get less accurate with a > finer mesh. > > > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to > make sure that it is converging? and using a smaller -ksp_rtol for > more grid points. For example with 10,000 grid points in each direction and > no better idea of what the discretization error is I would use a tol of > 1.e-12 > > > > Barry > > > > We'll deal with the multigrid questions after we've resolved the more > basic issues. > > > > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this > solver is not suitable to the system with small size grid? Or, whether the > problem crossing 6 orders of length scale is solvable with only one level > grid system when the memory is enough for large matrix? Since there is less > coding work for one level grid size, it would be easy to implement the > solver. > > > > > > I did some research work on the website and found the slides by Barry > on > > > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > > It seems that the multigrid framework in PETSc is a possible approach > to our problem. We are thinking to turn to the multigrid framework in PETSc > to solve the problem. However, before we dig into it, there are some issues > confusing us. It would be great if we can get any suggestion from you: > > > 1 Whether the multigrid framework can handle the problem with a large > variety in length scales (up to 6 orders)? Is DMMG is the best tool for our > problem? > > > > > > 2 The coefficient matrix A and the right hand side vector b were > created for the finite difference scheme of the domain and solved by KSP > solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it > easy to immigrate the created Matrix A and Vector b to the multigrid > framework? > > > > > > 3 How many levels of the subgrid are needed to obtain a solution close > enough to the exact solution for a problem with 6 orders in length scale? > > > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Feb 9 10:54:19 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 9 Feb 2011 17:54:19 +0100 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: On Wed, Feb 9, 2011 at 17:44, Peter Wang wrote: > Did you mean All the Krylov methods alone will get worse with increasing > grid number? > Yes, the number of Krylov iterations for second order elliptic problems with no preconditioner scales proportional to the number of grid points in any direction. You need a spectrally equivalent preconditioner, usually multigrid of some sort, to prevent this. > Since the finer grid has smaller size and more number of grid. > > Since I am a new user of PETSc, the easiest way for me is still keep in > KSP solver. However, if the solver cannot satisfy the speed reqirement. I am > thinking to use MG method. However, I don't have any experience > on multigrid. Could you please give me some suggestion on it? > > 1, Since I have built the Matrix and the vector for finite difference > scheme in KSP solver, where should I start from to transfer to multigrid? I > studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good > prototype to be based on to create my own code? Is DMMG is the best tool > for my problem? > Assuming you currently assemble a matrix, just configure PETSc with --download-ml and --download-hypre, then try running your code with -pc_type ml or -pc_type hypre. You can use geometric multigrid later to improve the constants or handle cases where algebraic multigrid (ML or BoomerAMG from Hypre) are having trouble. You need to tell us what equations you are solving if you want useful suggestions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengxwang at hotmail.com Wed Feb 9 11:22:59 2011 From: pengxwang at hotmail.com (Peter Wang) Date: Wed, 9 Feb 2011 11:22:59 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: , <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>, , , , Message-ID: Thanks a lot, Jed, I will try the algebric multigrid first. In order to configure PETSc with --download-ml and --download-hypre,which shell file in unix should I modify? Should I add some line in my current code to run with -pc_type ml or -pc_type hypre, or just use runtime option? I am solving a 2-D poisson equation with finite difference scheme. Please find the problem discription as attached if it is necessary. Thanks again. Date: Wed, 9 Feb 2011 17:54:19 +0100 From: jed at 59A2.org To: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] questions about the multigrid framework On Wed, Feb 9, 2011 at 17:44, Peter Wang wrote: Did you mean All the Krylov methods alone will get worse with increasing grid number? Yes, the number of Krylov iterations for second order elliptic problems with no preconditioner scales proportional to the number of grid points in any direction. You need a spectrally equivalent preconditioner, usually multigrid of some sort, to prevent this. Since the finer grid has smaller size and more number of grid. Since I am a new user of PETSc, the easiest way for me is still keep in KSP solver. However, if the solver cannot satisfy the speed reqirement. I am thinking to use MG method. However, I don't have any experience on multigrid. Could you please give me some suggestion on it? 1, Since I have built the Matrix and the vector for finite difference scheme in KSP solver, where should I start from to transfer to multigrid? I studied the example in: src/ksp/ksp/examples/tutorials/ex22f.F. Is it a good prototype to be based on to create my own code? Is DMMG is the best tool for my problem? Assuming you currently assemble a matrix, just configure PETSc with --download-ml and --download-hypre, then try running your code with -pc_type ml or -pc_type hypre. You can use geometric multigrid later to improve the constants or handle cases where algebraic multigrid (ML or BoomerAMG from Hypre) are having trouble. You need to tell us what equations you are solving if you want useful suggestions. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Problem_discription.pdf Type: application/pdf Size: 101028 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Feb 9 11:36:47 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 9 Feb 2011 11:36:47 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: References: , <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov> Message-ID: <6A8840D2-4317-4A31-997D-242F898F32AD@mcs.anl.gov> On Feb 9, 2011, at 9:58 AM, Peter Wang wrote: > Thanks Barry, > > I run the code with -ksp_monitor_true_residual -ksp_converged_reason, and it turns out that the computation didn't get the real convergence. After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) > at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07 > IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07. > and it didn't converge yet. > > I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help. You have a little confusion here. Multigrid (in the context of PETSc and numerical solvers) is ONLY an efficient way to solve a set of linear equations arising from discretizing a PDE. It is not a different way of discretizing the PDEs or giving a different or better solution. It is only a way of getting the same solution (potentially much) faster than running the slower convergent solver. Definitely configure PETSc with --download-ml --download-hypre and make runs using -pc_type hypre and then -pc_type ml to see how algebraic multigrid works, it should work fine for your problem. Barry > > > > From: bsmith at mcs.anl.gov > > Date: Sun, 6 Feb 2011 21:30:56 -0600 > > To: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] questions about the multigrid framework > > > > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. > > > > > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. > > > > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh. > > > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12 > > > > Barry > > > > We'll deal with the multigrid questions after we've resolved the more basic issues. > > > > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. > > > > > > I did some research work on the website and found the slides by Barry on > > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: > > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? > > > > > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? > > > > > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? > > > > > From jdbst21 at gmail.com Wed Feb 9 13:02:00 2011 From: jdbst21 at gmail.com (Joshua Booth) Date: Wed, 9 Feb 2011 14:02:00 -0500 Subject: [petsc-users] Metis NodeND in Petsc Message-ID: Hello, I have been looking for an easy was to use Metis's NodeND in Petsc in a similar fashion as kways. I was wondering if there is some way to go about this using the metis interface. Josh -------------- next part -------------- An HTML attachment was scrubbed... URL: From pflath at ices.utexas.edu Wed Feb 9 14:22:15 2011 From: pflath at ices.utexas.edu (Pearl Flath) Date: Wed, 9 Feb 2011 14:22:15 -0600 Subject: [petsc-users] Matrix free SNES and Jacobian, function evaluations Message-ID: Dear All, I'd like to use the SNES nonlinear solvers. For my problem, evaluation of the Jacobian and the right hand side function involves some identical steps. I'd prefer not to repeat those, and instead have the Jacobian and function calculation share some computations. Is there a way to do this? Does SNES consistently evaluate one of them first, and thus I could have the other one re-use that information? Or is there a way to tell SNES to call a general update at each step before evaluating the Jacobian and function? Many thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Feb 9 14:31:11 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 9 Feb 2011 21:31:11 +0100 Subject: [petsc-users] Matrix free SNES and Jacobian, function evaluations In-Reply-To: References: Message-ID: The function is always evaluated before the jacobian, but sometimes the function is evaluated at several places before a jacobian is needed (e.g. in a line search). You can cache any information you want in the user context during function evaluation and use it to speed up jacobian evaluation. On Feb 9, 2011 9:22 PM, "Pearl Flath" wrote: Dear All, I'd like to use the SNES nonlinear solvers. For my problem, evaluation of the Jacobian and the right hand side function involves some identical steps. I'd prefer not to repeat those, and instead have the Jacobian and function calculation share some computations. Is there a way to do this? Does SNES consistently evaluate one of them first, and thus I could have the other one re-use that information? Or is there a way to tell SNES to call a general update at each step before evaluating the Jacobian and function? Many thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengxwang at hotmail.com Wed Feb 9 16:29:48 2011 From: pengxwang at hotmail.com (Peter Wang) Date: Wed, 9 Feb 2011 16:29:48 -0600 Subject: [petsc-users] questions about the multigrid framework In-Reply-To: <6A8840D2-4317-4A31-997D-242F898F32AD@mcs.anl.gov> References: , , <465942DD-1195-4D5C-A40C-665FF0FEFCCC@mcs.anl.gov>, , <6A8840D2-4317-4A31-997D-242F898F32AD@mcs.anl.gov> Message-ID: Thanks a lot, Barry and Jed. Your explain is very clear and informative. Your suggestions make me move forward to my goal smoothly. I will try it. > From: bsmith at mcs.anl.gov > Date: Wed, 9 Feb 2011 11:36:47 -0600 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] questions about the multigrid framework > > > On Feb 9, 2011, at 9:58 AM, Peter Wang wrote: > > > Thanks Barry, > > > > I run the code with -ksp_monitor_true_residual -ksp_converged_reason, and it turns out that the computation didn't get the real convergence. After I set the rtol and more iteration, the numerical solution get better. However, the computation converges very slowly with finer grid points. For example, with nx=2500 and ny=10000, (lx=2.5e-4,ly=1e-3, and the distribution varys mainly in y direction) > > at IT=72009, true resid norm 1.638857052871e-01 ||Ae||/||Ax|| 9.159199925235e-07 > > IT=400000,true resid norm 1.638852449299e-01 ||Ae||/||Ax|| 9.159174196917e-07. > > and it didn't converge yet. > > > > I am wondering if the solver is changed, the convergency speed could get fater? Or, I should take anohte approach to use finer grids, like multigrid? Thanks for your help. > > You have a little confusion here. Multigrid (in the context of PETSc and numerical solvers) is ONLY an efficient way to solve a set of linear equations arising from discretizing a PDE. It is not a different way of discretizing the PDEs or giving a different or better solution. It is only a way of getting the same solution (potentially much) faster than running the slower convergent solver. > > Definitely configure PETSc with --download-ml --download-hypre and make runs using -pc_type hypre and then -pc_type ml to see how algebraic multigrid works, it should work fine for your problem. > > Barry > > > > > > > > > From: bsmith at mcs.anl.gov > > > Date: Sun, 6 Feb 2011 21:30:56 -0600 > > > To: petsc-users at mcs.anl.gov > > > Subject: Re: [petsc-users] questions about the multigrid framework > > > > > > > > > On Feb 6, 2011, at 5:00 PM, Peter Wang wrote: > > > > > > > Hello, I have some concerns about the multigrid framework in PETSc. > > > > > > > > We are trying to solve a two dimensional problem with a large variety in length scales. The length of computational domain is in order of 1e3 m, and the width is in 1 m, nevertheless, there is a tiny object with 1e-3 m in a corner of the domain. > > > > > > > > As a first thinking, we tried to solve the problem with a larger number of uniform or non-uniform grids. However, the error of the numerical solution increases when the number of the grid is too large. In order to test the effect of the grid size on the solution, a domain with regular scale of 1m by 1m was tried to solve. It is found that the extreme small grid size might lead to large variation to the exact solution. For example, the exact solution is a linear distribution in the domain. The numerical solution is linear as similar as the exact solution when the grid number is nx=1000 by ny=1000. However, if the grid number is nx=10000 by ny=10000, the numerical solution varies to nonlinear distribution which boundary is the only same as the exact solution. > > > > > > Stop right here. 99.9% of the time what you describe should not happen, with a finer grid your solution (for a problem with a known solution for example) will be more accurate and won't suddenly get less accurate with a finer mesh. > > > > > > Are you running with -ksp_monitor_true_residual -ksp_converged_reason to make sure that it is converging? and using a smaller -ksp_rtol for more grid points. For example with 10,000 grid points in each direction and no better idea of what the discretization error is I would use a tol of 1.e-12 > > > > > > Barry > > > > > > We'll deal with the multigrid questions after we've resolved the more basic issues. > > > > > > > > > > The solver I used is a KSP solver in PETSc, which is set by calling : > > > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr). Whether this solver is not suitable to the system with small size grid? Or, whether the problem crossing 6 orders of length scale is solvable with only one level grid system when the memory is enough for large matrix? Since there is less coding work for one level grid size, it would be easy to implement the solver. > > > > > > > > I did some research work on the website and found the slides by Barry on > > > > http://www.mcs.anl.gov/petsc/petsc-2/documentation/tutorials/Columbia04/DDandMultigrid.pdf > > > > It seems that the multigrid framework in PETSc is a possible approach to our problem. We are thinking to turn to the multigrid framework in PETSc to solve the problem. However, before we dig into it, there are some issues confusing us. It would be great if we can get any suggestion from you: > > > > 1 Whether the multigrid framework can handle the problem with a large variety in length scales (up to 6 orders)? Is DMMG is the best tool for our problem? > > > > > > > > 2 The coefficient matrix A and the right hand side vector b were created for the finite difference scheme of the domain and solved by KSP solver (callKSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)). Is it easy to immigrate the created Matrix A and Vector b to the multigrid framework? > > > > > > > > 3 How many levels of the subgrid are needed to obtain a solution close enough to the exact solution for a problem with 6 orders in length scale? > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenway at utias.utoronto.ca Wed Feb 9 17:43:59 2011 From: kenway at utias.utoronto.ca (Gaetan Kenway) Date: Wed, 9 Feb 2011 18:43:59 -0500 Subject: [petsc-users] PETSc MatMFFDSetFunction Function Message-ID: Hello I was wondering what the user supplied function is supposed to look like for setting the function in MatMFFDSetFunction. I am trying to use a Matrix-Free Matrix for a linear Krylov Solver. The website says: PetscErrorCode PETSCMAT_DLLEXPORT MatMFFDSetFunction(Mat mat,PetscErrorCode (*func)(void*,Vec,Vec),void *funcctx) This indicates the function should have the calling sequence: (void *,Vec,Vec). Since there are zero examples of actually using this function, what exactly is the sequence? I gather that the second and third arguments are Vec x and Vec y where x is the input and y is the output, but what is the void * supposed to be. I'm doing this in Fortran, so I really don't know what argument "void *" should correspond to? Currently my code looks like this: ! Setup Matrix-Free dRdw matrix call MatCreateMFFD(sumb_comm_world,nDimW,nDimW,& PETSC_DETERMINE,PETSC_DETERMINE,dRdw,ierr) call MatMFFDSetFunction(dRdw,FormFunction2,ctx,ierr) call MatAssemblyBegin(dRdw,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(dRdw,MAT_FINAL_ASSEMBLY,ierr) The function prototype for FormFunction2 is: subroutine FormFunction2(mfmat,wVec,rVec,ierr) Mat mfmat Vec wVec, rVec PetscInt ierr end subroutine FormFunction2 When I try to use this in a KSP linear solve I get the following tracsback: [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSCERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] MatMult line 1877 src/mat/interface/matrix.c [0]PETSC ERROR: [0] PCApplyBAorAB line 540 src/ksp/pc/interface/precon.c [0]PETSC ERROR: [0] GMREScycle line 132 src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: [0] KSPSolve_GMRES line 227 src/ksp/ksp/impls/gmres/gmres.c If I use KSPPREONLY it works fine. Thank you, Gaetan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Feb 9 18:07:12 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 9 Feb 2011 18:07:12 -0600 Subject: [petsc-users] PETSc MatMFFDSetFunction Function In-Reply-To: References: Message-ID: <6F3BBCF6-BBC2-4CD6-9FC6-E2BCFA8A7BAB@mcs.anl.gov> Gaetan, This is not normally used by users since most people work with SNES and SNES provides a simple wrapper that uses it with the nonlinear function your provide to SNES. If you are writing your own Newton's method and using KSP we recommend instead that you use SNES to handle the Newton's method for you. Anyways from Fortran the calling sequence is myfunction(void *ctx,Vec x,Vec y, integer ierr) where ierr is where you put 0 or an error code if you detect an error in your routine. You can pass PETSC_OBJECT_NULL as ctx and just no use it or you can pass an array or other Fortran thing that contains information that you wish to use in your function evaluation. If it still crashes after you get the right calling sequence you can use -start_in_debugger to see why it is crashing and quickly resolve the problem. Barry On Feb 9, 2011, at 5:43 PM, Gaetan Kenway wrote: > Hello > > I was wondering what the user supplied function is supposed to look like for setting the function in MatMFFDSetFunction. I am trying to use a Matrix-Free Matrix for a linear Krylov Solver. The website says: > PetscErrorCode PETSCMAT_DLLEXPORT MatMFFDSetFunction(Mat mat,PetscErrorCode (*func)(void*,Vec,Vec),void *funcctx) > > This indicates the function should have the calling sequence: (void *,Vec,Vec). Since there are zero examples of actually using this function, what exactly is the sequence? I gather that the second and third arguments are Vec x and Vec y where x is the input and y is the output, but what is the void * supposed to be. > > I'm doing this in Fortran, so I really don't know what argument "void *" should correspond to? > > Currently my code looks like this: > > ! Setup Matrix-Free dRdw matrix > call MatCreateMFFD(sumb_comm_world,nDimW,nDimW,& > PETSC_DETERMINE,PETSC_DETERMINE,dRdw,ierr) > > call MatMFFDSetFunction(dRdw,FormFunction2,ctx,ierr) > call MatAssemblyBegin(dRdw,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(dRdw,MAT_FINAL_ASSEMBLY,ierr) > > The function prototype for FormFunction2 is: > > subroutine FormFunction2(mfmat,wVec,rVec,ierr) > Mat mfmat > Vec wVec, rVec > PetscInt ierr > end subroutine FormFunction2 > > When I try to use this in a KSP linear solve I get the following tracsback: > > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] MatMult line 1877 src/mat/interface/matrix.c > [0]PETSC ERROR: [0] PCApplyBAorAB line 540 src/ksp/pc/interface/precon.c > [0]PETSC ERROR: [0] GMREScycle line 132 src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_GMRES line 227 src/ksp/ksp/impls/gmres/gmres.c > > If I use KSPPREONLY it works fine. > > Thank you, > > Gaetan From aron.ahmadia at kaust.edu.sa Thu Feb 10 07:44:09 2011 From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia) Date: Thu, 10 Feb 2011 16:44:09 +0300 Subject: [petsc-users] [petsc-dev] configure option missing for MPI.h / on IBM machine In-Reply-To: References: Message-ID: add opt/ibmhpc/ppe.poe/include/ibmmpi/ to your ./configure options like this: --with-mpi-include=/opt/ibmhpc/ppe.poe/include/ibmmpi/ You may have to manually add the MPI libraries and their path as well, since BuildSystem tends to like these packaged together. Ping the list back if you can't figure it out from there. -Aron On Thu, Feb 10, 2011 at 3:23 PM, lvankampenhout at gmail.com < lvankampenhout at gmail.com> wrote: > Hi all, i'm having this error when configuring the latest petsc-dev on an > IBM PPC system. > > > TESTING: CxxMPICheck from > config.packages.MPI(/gpfs/h01/vkampenh/install/petsc-dev/config/BuildSystem/config/packages/MPI.py:611) > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ------------------------------------------------------------------------------- > C++ error! mpi.h could not be located at: [] > > ******************************************************************************* > > > My configure options: ./configure --with-batch=1 --with-mpi-shared=0 > --with-endian=big --with-memcmp-ok --sizeof-void-p=8 --sizeof-char=1 > --sizeof-short=2 --sizeof-int=4 --sizeof-long=8 --sizeof-size-t=8 > --sizeof-long-long=8 --sizeof-float=4 --sizeof-double=8 --bits-per-byte=8 > --sizeof-MPI-Comm=8 --sizeof-MPI-Fint=4 --have-mpi-long-double=1 > --with-f90-interface=rs6000 --with-cc="mpcc -compiler xlc_r -q64" > --with-fc="mpfort -compiler xlf_r -q64" --FFLAGS="-O3 -qhot -qstrict > -qarch=auto -qtune=auto" --CFLAGS="-O3 -qhot -qstrict -qarch=auto > -qtune=auto" --LIBS=-lmass_64 --with-ar=/usr/bin/ar > --prefix=/sara/sw/petsc/3.0.0-p8/real --with-scalar-type=real > PETSC_ARCH=linux-ibm-pwr6-xlf-real-64 --with-shared=0 -with-debugging=0 > --download-ml --download-hypre > > > vkampenh at p6012:~/install/petsc-dev> module list > Currently Loaded Modulefiles: > 1) compilerwrappers/yes 4) c++/ibm/11.1 7) upc/ibm/11.1 > 2) java/ibm/1.5 5) fortran/ibm/13.1 > 3) c/ibm/11.1 6) sara > > > vkampenh at p6012:~/install/petsc-dev> locate mpi.h > /opt/ibm/java2-ppc64-50/include/jvmpi.h > /opt/ibmhpc/ppe.poe/include/ibmmpi/mpi.h > /opt/mpich/include/mpi.h > /usr/include/boost/mpi.hpp > /usr/lib64/gcc/powerpc64-suse-linux/4.3/include/jvmpi.h > /usr/lib64/mpi/gcc/openmpi/include/mpi.h > /usr/lib64/mpi/gcc/openmpi/include/openmpi/ompi/mpi/f77/prototypes_mpi.h > > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/default/include/config/usb/serial/siemens/mpi.h > > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/ppc64/include/config/usb/serial/siemens/mpi.h > > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/trace/include/config/usb/serial/siemens/mpi.h > /usr/src/linux-2.6.32.27-0.2/drivers/message/fusion/lsi/mpi.h > > > Is there an easy way to add the IBMHPC/PPE.POE directory to the configure > list, so that it will be recognized? The machine uses LoadLeveler schedule > system, which handles the MPI settings. > > Thanks, > Leo > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 10 11:58:23 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 10 Feb 2011 11:58:23 -0600 Subject: [petsc-users] [petsc-dev] configure option missing for MPI.h / on IBM machine In-Reply-To: References: Message-ID: <58665C2D-BB35-45F0-BAFB-85ADCB1168B9@mcs.anl.gov> Aron, Shouldn't the mpcc and mpfort manage providing the include directories and libraries automatically (like everyone elses mpicc etc does?) Seems very cumbersome that users need to know they strange directories and include them themselves? A real step backwards in usability? Barry On Feb 10, 2011, at 7:44 AM, Aron Ahmadia wrote: > add opt/ibmhpc/ppe.poe/include/ibmmpi/ to your ./configure options like this: > > --with-mpi-include=/opt/ibmhpc/ppe.poe/include/ibmmpi/ > > You may have to manually add the MPI libraries and their path as well, since BuildSystem tends to like these packaged together. Ping the list back if you can't figure it out from there. > > -Aron > > On Thu, Feb 10, 2011 at 3:23 PM, lvankampenhout at gmail.com wrote: > Hi all, i'm having this error when configuring the latest petsc-dev on an IBM PPC system. > > > TESTING: CxxMPICheck from config.packages.MPI(/gpfs/h01/vkampenh/install/petsc-dev/config/BuildSystem/config/packages/MPI.py:611) > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > C++ error! mpi.h could not be located at: [] > ******************************************************************************* > > > My configure options: ./configure --with-batch=1 --with-mpi-shared=0 --with-endian=big --with-memcmp-ok --sizeof-void-p=8 --sizeof-char=1 --sizeof-short=2 --sizeof-int=4 --sizeof-long=8 --sizeof-size-t=8 --sizeof-long-long=8 --sizeof-float=4 --sizeof-double=8 --bits-per-byte=8 --sizeof-MPI-Comm=8 --sizeof-MPI-Fint=4 --have-mpi-long-double=1 --with-f90-interface=rs6000 --with-cc="mpcc -compiler xlc_r -q64" --with-fc="mpfort -compiler xlf_r -q64" --FFLAGS="-O3 -qhot -qstrict -qarch=auto -qtune=auto" --CFLAGS="-O3 -qhot -qstrict -qarch=auto -qtune=auto" --LIBS=-lmass_64 --with-ar=/usr/bin/ar --prefix=/sara/sw/petsc/3.0.0-p8/real --with-scalar-type=real PETSC_ARCH=linux-ibm-pwr6-xlf-real-64 --with-shared=0 -with-debugging=0 --download-ml --download-hypre > > > vkampenh at p6012:~/install/petsc-dev> module list > Currently Loaded Modulefiles: > 1) compilerwrappers/yes 4) c++/ibm/11.1 7) upc/ibm/11.1 > 2) java/ibm/1.5 5) fortran/ibm/13.1 > 3) c/ibm/11.1 6) sara > > > vkampenh at p6012:~/install/petsc-dev> locate mpi.h > /opt/ibm/java2-ppc64-50/include/jvmpi.h > /opt/ibmhpc/ppe.poe/include/ibmmpi/mpi.h > /opt/mpich/include/mpi.h > /usr/include/boost/mpi.hpp > /usr/lib64/gcc/powerpc64-suse-linux/4.3/include/jvmpi.h > /usr/lib64/mpi/gcc/openmpi/include/mpi.h > /usr/lib64/mpi/gcc/openmpi/include/openmpi/ompi/mpi/f77/prototypes_mpi.h > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/default/include/config/usb/serial/siemens/mpi.h > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/ppc64/include/config/usb/serial/siemens/mpi.h > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/trace/include/config/usb/serial/siemens/mpi.h > /usr/src/linux-2.6.32.27-0.2/drivers/message/fusion/lsi/mpi.h > > > Is there an easy way to add the IBMHPC/PPE.POE directory to the configure list, so that it will be recognized? The machine uses LoadLeveler schedule system, which handles the MPI settings. > > Thanks, > Leo > > > From aron.ahmadia at kaust.edu.sa Thu Feb 10 13:45:36 2011 From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia) Date: Thu, 10 Feb 2011 22:45:36 +0300 Subject: [petsc-users] [petsc-dev] configure option missing for MPI.h / on IBM machine In-Reply-To: <58665C2D-BB35-45F0-BAFB-85ADCB1168B9@mcs.anl.gov> References: <58665C2D-BB35-45F0-BAFB-85ADCB1168B9@mcs.anl.gov> Message-ID: I was wondering this myself. On the BlueGene line the MPI installation is based on MPICH, so the mpi* compilers behave as you'd expect on any other MPICH install. I'm not familiar with the voodoo in the IBM-HPC toolkit or the intricacies of this particular machine, but since it's obviously being administered by *somebody* (see the modules in Leo's environment), I'd expect the administrators to have gotten it right. A On Thu, Feb 10, 2011 at 8:58 PM, Barry Smith wrote: > > Aron, > > Shouldn't the mpcc and mpfort manage providing the include directories > and libraries automatically (like everyone elses mpicc etc does?) Seems very > cumbersome that users need to know they strange directories and include them > themselves? A real step backwards in usability? > > Barry > > On Feb 10, 2011, at 7:44 AM, Aron Ahmadia wrote: > > > add opt/ibmhpc/ppe.poe/include/ibmmpi/ to your ./configure options like > this: > > > > --with-mpi-include=/opt/ibmhpc/ppe.poe/include/ibmmpi/ > > > > You may have to manually add the MPI libraries and their path as well, > since BuildSystem tends to like these packaged together. Ping the list back > if you can't figure it out from there. > > > > -Aron > > > > On Thu, Feb 10, 2011 at 3:23 PM, lvankampenhout at gmail.com < > lvankampenhout at gmail.com> wrote: > > Hi all, i'm having this error when configuring the latest petsc-dev on an > IBM PPC system. > > > > > > TESTING: CxxMPICheck from > config.packages.MPI(/gpfs/h01/vkampenh/install/petsc-dev/config/BuildSystem/config/packages/MPI.py:611) > > > ******************************************************************************* > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > > ------------------------------------------------------------------------------- > > C++ error! mpi.h could not be located at: [] > > > ******************************************************************************* > > > > > > My configure options: ./configure --with-batch=1 --with-mpi-shared=0 > --with-endian=big --with-memcmp-ok --sizeof-void-p=8 --sizeof-char=1 > --sizeof-short=2 --sizeof-int=4 --sizeof-long=8 --sizeof-size-t=8 > --sizeof-long-long=8 --sizeof-float=4 --sizeof-double=8 --bits-per-byte=8 > --sizeof-MPI-Comm=8 --sizeof-MPI-Fint=4 --have-mpi-long-double=1 > --with-f90-interface=rs6000 --with-cc="mpcc -compiler xlc_r -q64" > --with-fc="mpfort -compiler xlf_r -q64" --FFLAGS="-O3 -qhot -qstrict > -qarch=auto -qtune=auto" --CFLAGS="-O3 -qhot -qstrict -qarch=auto > -qtune=auto" --LIBS=-lmass_64 --with-ar=/usr/bin/ar > --prefix=/sara/sw/petsc/3.0.0-p8/real --with-scalar-type=real > PETSC_ARCH=linux-ibm-pwr6-xlf-real-64 --with-shared=0 -with-debugging=0 > --download-ml --download-hypre > > > > > > vkampenh at p6012:~/install/petsc-dev> module list > > Currently Loaded Modulefiles: > > 1) compilerwrappers/yes 4) c++/ibm/11.1 7) upc/ibm/11.1 > > 2) java/ibm/1.5 5) fortran/ibm/13.1 > > 3) c/ibm/11.1 6) sara > > > > > > vkampenh at p6012:~/install/petsc-dev> locate mpi.h > > /opt/ibm/java2-ppc64-50/include/jvmpi.h > > /opt/ibmhpc/ppe.poe/include/ibmmpi/mpi.h > > /opt/mpich/include/mpi.h > > /usr/include/boost/mpi.hpp > > /usr/lib64/gcc/powerpc64-suse-linux/4.3/include/jvmpi.h > > /usr/lib64/mpi/gcc/openmpi/include/mpi.h > > /usr/lib64/mpi/gcc/openmpi/include/openmpi/ompi/mpi/f77/prototypes_mpi.h > > > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/default/include/config/usb/serial/siemens/mpi.h > > > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/ppc64/include/config/usb/serial/siemens/mpi.h > > > /usr/src/linux-2.6.32.27-0.2-obj/ppc64/trace/include/config/usb/serial/siemens/mpi.h > > /usr/src/linux-2.6.32.27-0.2/drivers/message/fusion/lsi/mpi.h > > > > > > Is there an easy way to add the IBMHPC/PPE.POE directory to the configure > list, so that it will be recognized? The machine uses LoadLeveler schedule > system, which handles the MPI settings. > > > > Thanks, > > Leo > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckontzialis at lycos.com Fri Feb 11 10:37:14 2011 From: ckontzialis at lycos.com (Kontsantinos Kontzialis) Date: Fri, 11 Feb 2011 18:37:14 +0200 Subject: [petsc-users] Guidelines for solving the euler equations with an implicit matrix free approach Message-ID: <4D5565BA.5040403@lycos.com> Dear Petsc team, I'm new in Petsc and I'm trying to solve the euler equations of fluid dynamics using an implicit matrix free approach with a spatial discontinues galerkin discretization. I need some directions about how can I solve the following system: (M/dt+dR/du)*DU=R where R denotes the residual of the system and dR/du the residual jacobian. Please help. Kostas From ckontzialis at lycos.com Fri Feb 11 10:37:57 2011 From: ckontzialis at lycos.com (Kontsantinos Kontzialis) Date: Fri, 11 Feb 2011 18:37:57 +0200 Subject: [petsc-users] mail Message-ID: <4D5565E5.8090502@lycos.com> ckontzialis at lycos.com From knepley at gmail.com Sat Feb 12 16:12:36 2011 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 12 Feb 2011 16:12:36 -0600 Subject: [petsc-users] Guidelines for solving the euler equations with an implicit matrix free approach In-Reply-To: <4D5565BA.5040403@lycos.com> References: <4D5565BA.5040403@lycos.com> Message-ID: On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis < ckontzialis at lycos.com> wrote: > Dear Petsc team, > > I'm new in Petsc and I'm trying to solve the euler equations of fluid > dynamics > using an implicit matrix free approach with a spatial discontinues galerkin > discretization. > I need some directions about how can I solve the following system: > > (M/dt+dR/du)*DU=R > > where R denotes the residual of the system and dR/du the residual jacobian. > Please help. > Petsc provides linear algebra and nonlinear solvers. This is fine once you have discretized. It sounds like you will use DG: a) on a structured or unstructured grid? The PETSc DA supports structured grids in any dimension. After this, you want to use the TS object to present your system. There are many examples in the TS, e.g. ex10 for radiation-diffusion or ex14 for hydrostatic ice flow. Once you have your problem producing the correct residual and Jacobian, we can talk about solvers. Matt > Kostas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From khalid_eee at yahoo.com Sat Feb 12 20:47:02 2011 From: khalid_eee at yahoo.com (khalid ashraf) Date: Sat, 12 Feb 2011 18:47:02 -0800 (PST) Subject: [petsc-users] Reading vtk file in parallel and assigning to an array Message-ID: <388810.1602.qm@web112617.mail.gq1.yahoo.com> Hi, I have a .vtk file that I want to read. I want to read one floating point from each line of the file and assign the value to an array. If I use standard C commands like fscanf(), then it works on single processor but doesn't keep the right order when run on on multi processors. Could you please give a small code snippet to do it the PETSC way in parallel ? Thanks. ____________________________________________________________________________________ Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. http://tools.search.yahoo.com/toolbar/features/mail/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Feb 12 21:41:51 2011 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 12 Feb 2011 21:41:51 -0600 Subject: [petsc-users] Reading vtk file in parallel and assigning to an array In-Reply-To: <388810.1602.qm@web112617.mail.gq1.yahoo.com> References: <388810.1602.qm@web112617.mail.gq1.yahoo.com> Message-ID: On Sat, Feb 12, 2011 at 8:47 PM, khalid ashraf wrote: > Hi, I have a .vtk file that I want to read. I want to read one floating > point from each line of the file and assign the value to an array. > If I use standard C commands like fscanf(), then it works on single > processor but doesn't keep the right order when run on on multi processors. > Could you please give a small code snippet to do it the PETSC way in > parallel ? > The easiest way is to read it in serial, and save it in PETSc binary format. Then it can be loaded in parallel. Matt > Thanks. > > > > ------------------------------ > Don't be flakey. Get Yahoo! Mail for Mobileand > always stay connectedto friends. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurish108 at gmail.com Sun Feb 13 16:37:11 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Sun, 13 Feb 2011 17:37:11 -0500 Subject: [petsc-users] Better way to pre-allocate memory for matrix being read in ??? Message-ID: Hi, I have a text file containing the non-zero entries of a sparse matrix of dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j, element] format. That is ALL the information I have regarding the matrix. However when pre-allocating memory with MatSeqAIJSetPreallocation(Mat B,PetscInt nz,const PetscInt nnz[]), the parameters nz and nnz need to be known, nz=number of nonzeros per row (same for all rows) nnz=array containing the number of nonzeros in the various rows (possibly different for each row) or PETSC_NULL which i do not know for my matrix, ( unless I resort to using MATLAB. ). Does that mean I have to set nz= 1274 (the length of a row) and nnz=PETSC_NULL ? Though, I guess this setting would consume a lot of memory for higher order matrices. How then, should I go about memory pre-allocation more efficiently? Thanks, Gaurish Telang There is a code in the PETSc folder (/src/mat/examples/tests/ex78.c ) which reads in a matrix of this format. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Sun Feb 13 16:47:46 2011 From: jed at 59A2.org (Jed Brown) Date: Sun, 13 Feb 2011 23:47:46 +0100 Subject: [petsc-users] Better way to pre-allocate memory for matrix being read in ??? In-Reply-To: References: Message-ID: On Sun, Feb 13, 2011 at 23:37, Gaurish Telang wrote: > I have a text file containing the non-zero entries of a sparse matrix of > dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j, > element] format. > This is a horrible format and can not scale. If it becomes a performance issue, change the format. > > That is ALL the information I have regarding the matrix. > > However when pre-allocating memory with MatSeqAIJSetPreallocation(Mat > B,PetscInt nz,const PetscInt nnz[]), the parameters nz and nnz need to be > known, > > nz=number of nonzeros per row (same for all rows) > nnz=array containing the number of nonzeros in the various rows (possibly > different for each row) or PETSC_NULL > > which i do not know for my matrix, ( unless I resort to using MATLAB. ). > > Does that mean I have to set nz= 1274 (the length of a row) and > nnz=PETSC_NULL ? > No, read the file twice. The first time through, just count the number of nonzeros (per row), then set preallocation with the correct size, and finally read the file a second time calling MatSetValue() for each entry. For a matrix this small, you could just read it in without preallocating, but that will get too expensive quickly if you increase the matrix size. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurish108 at gmail.com Sun Feb 13 18:16:47 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Sun, 13 Feb 2011 19:16:47 -0500 Subject: [petsc-users] Question on LOCDIR Message-ID: Hi, I notice that in the tutorial codes in $PETSC_DIR , the makefiles have the LOCDIR variable defined at the top, to be the current working directory. Whereas in the makefile chapter of the manual, the given template makefile has no mention of the LOCDIR variable. So, what is the significance of this variable? My PETSc programs seem to compile fine without introducing it into my makefile. Gaurish -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckontzialis at lycos.com Sun Feb 13 20:12:13 2011 From: ckontzialis at lycos.com (Kontsantinos Kontzialis) Date: Mon, 14 Feb 2011 04:12:13 +0200 Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 30 In-Reply-To: References: Message-ID: <4D588F7D.2060209@lycos.com> On 02/13/2011 08:00 PM, petsc-users-request at mcs.anl.gov wrote: > Send petsc-users mailing list submissions to > petsc-users at mcs.anl.gov > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.mcs.anl.gov/mailman/listinfo/petsc-users > or, via email, send a message with subject or body 'help' to > petsc-users-request at mcs.anl.gov > > You can reach the person managing the list at > petsc-users-owner at mcs.anl.gov > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of petsc-users digest..." > > > Today's Topics: > > 1. Re: Guidelines for solving the euler equations with an > implicit matrix free approach (Matthew Knepley) > 2. Reading vtk file in parallel and assigning to an array > (khalid ashraf) > 3. Re: Reading vtk file in parallel and assigning to an array > (Matthew Knepley) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 12 Feb 2011 16:12:36 -0600 > From: Matthew Knepley > Subject: Re: [petsc-users] Guidelines for solving the euler equations > with an implicit matrix free approach > To: PETSc users list > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis< > ckontzialis at lycos.com> wrote: > >> Dear Petsc team, >> >> I'm new in Petsc and I'm trying to solve the euler equations of fluid >> dynamics >> using an implicit matrix free approach with a spatial discontinues galerkin >> discretization. >> I need some directions about how can I solve the following system: >> >> (M/dt+dR/du)*DU=R >> >> where R denotes the residual of the system and dR/du the residual jacobian. >> Please help. >> > Petsc provides linear algebra and nonlinear solvers. This is fine once you > have discretized. > It sounds like you will use DG: > > a) on a structured or unstructured grid? > > The PETSc DA supports structured grids in any dimension. After this, you > want to use the > TS object to present your system. There are many examples in the TS, e.g. > ex10 for > radiation-diffusion or ex14 for hydrostatic ice flow. > > Once you have your problem producing the correct residual and Jacobian, we > can talk > about solvers. > > Matt > > >> Kostas >> Mat, Thank you for your reply. I have done the discretization and the residual and jacobian are computes correctly. Furthermore, I managed to do some calculation using TS but with an explicit scheme. I need to work with an implicit time discretization and I have read in a quite few papers that they follow the matrix free approach. Thank you, Kostas From knepley at gmail.com Sun Feb 13 20:18:42 2011 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 13 Feb 2011 20:18:42 -0600 Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 30 In-Reply-To: <4D588F7D.2060209@lycos.com> References: <4D588F7D.2060209@lycos.com> Message-ID: On Sun, Feb 13, 2011 at 8:12 PM, Kontsantinos Kontzialis < ckontzialis at lycos.com> wrote: > Message: 1 >> Date: Sat, 12 Feb 2011 16:12:36 -0600 >> From: Matthew Knepley >> Subject: Re: [petsc-users] Guidelines for solving the euler equations >> with an implicit matrix free approach >> To: PETSc users list >> Message-ID: >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis< >> ckontzialis at lycos.com> wrote: >> >> Dear Petsc team, >>> >>> I'm new in Petsc and I'm trying to solve the euler equations of fluid >>> dynamics >>> using an implicit matrix free approach with a spatial discontinues >>> galerkin >>> discretization. >>> I need some directions about how can I solve the following system: >>> >>> (M/dt+dR/du)*DU=R >>> >>> where R denotes the residual of the system and dR/du the residual >>> jacobian. >>> Please help. >>> >>> Petsc provides linear algebra and nonlinear solvers. This is fine once >> you >> have discretized. >> It sounds like you will use DG: >> >> a) on a structured or unstructured grid? >> >> The PETSc DA supports structured grids in any dimension. After this, you >> want to use the >> TS object to present your system. There are many examples in the TS, e.g. >> ex10 for >> radiation-diffusion or ex14 for hydrostatic ice flow. >> >> Once you have your problem producing the correct residual and Jacobian, we >> can talk >> about solvers. >> >> Matt >> >> >> Kostas >>> >>> Mat, > > Thank you for your reply. I have done the discretization and the residual > and jacobian are computes correctly. Furthermore, I managed to do some > calculation using TS but with an explicit scheme. I need to work with an > implicit time discretization and I have read in a quite few papers that they > follow the matrix free approach. > Once you plug your Residual and Jacobian into the TS, you can start to try out different solvers. Is this working? Thanks, Matt > Thank you, > > Kostas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckontzialis at lycos.com Sun Feb 13 20:24:16 2011 From: ckontzialis at lycos.com (Kontsantinos Kontzialis) Date: Mon, 14 Feb 2011 04:24:16 +0200 Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 30 In-Reply-To: References: Message-ID: <4D589250.4000607@lycos.com> On 02/13/2011 08:00 PM, petsc-users-request at mcs.anl.gov wrote: > Send petsc-users mailing list submissions to > petsc-users at mcs.anl.gov > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.mcs.anl.gov/mailman/listinfo/petsc-users > or, via email, send a message with subject or body 'help' to > petsc-users-request at mcs.anl.gov > > You can reach the person managing the list at > petsc-users-owner at mcs.anl.gov > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of petsc-users digest..." > > > Today's Topics: > > 1. Re: Guidelines for solving the euler equations with an > implicit matrix free approach (Matthew Knepley) > 2. Reading vtk file in parallel and assigning to an array > (khalid ashraf) > 3. Re: Reading vtk file in parallel and assigning to an array > (Matthew Knepley) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 12 Feb 2011 16:12:36 -0600 > From: Matthew Knepley > Subject: Re: [petsc-users] Guidelines for solving the euler equations > with an implicit matrix free approach > To: PETSc users list > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis< > ckontzialis at lycos.com> wrote: > >> Dear Petsc team, >> >> I'm new in Petsc and I'm trying to solve the euler equations of fluid >> dynamics >> using an implicit matrix free approach with a spatial discontinues galerkin >> discretization. >> I need some directions about how can I solve the following system: >> >> (M/dt+dR/du)*DU=R >> >> where R denotes the residual of the system and dR/du the residual jacobian. >> Please help. >> > Petsc provides linear algebra and nonlinear solvers. This is fine once you > have discretized. > It sounds like you will use DG: > > a) on a structured or unstructured grid? > > The PETSc DA supports structured grids in any dimension. After this, you > want to use the > TS object to present your system. There are many examples in the TS, e.g. > ex10 for > radiation-diffusion or ex14 for hydrostatic ice flow. > > Once you have your problem producing the correct residual and Jacobian, we > can talk > about solvers. > > Matt > > >> Kostas >> Matt, Also, I work on unstructured grids in 2d. Kostas From bsmith at mcs.anl.gov Sun Feb 13 20:38:33 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 13 Feb 2011 20:38:33 -0600 Subject: [petsc-users] Question on LOCDIR In-Reply-To: References: Message-ID: LOCDIR is used when making manual pages and links to examples. It is not needed for running codes Barry On Feb 13, 2011, at 6:16 PM, Gaurish Telang wrote: > Hi, > > I notice that in the tutorial codes in $PETSC_DIR , the makefiles have the LOCDIR variable defined at the top, to be the current working directory. Whereas in the makefile chapter of the manual, > > the given template makefile has no mention of the LOCDIR variable. So, what is the significance of this variable? > > My PETSc programs seem to compile fine without introducing it into my makefile. > > Gaurish > From ckontzialis at lycos.com Sun Feb 13 21:09:51 2011 From: ckontzialis at lycos.com (Kontsantinos Kontzialis) Date: Mon, 14 Feb 2011 05:09:51 +0200 Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 31 In-Reply-To: References: Message-ID: <4D589CFF.7000703@lycos.com> On 02/14/2011 04:38 AM, petsc-users-request at mcs.anl.gov wrote: > Send petsc-users mailing list submissions to > petsc-users at mcs.anl.gov > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.mcs.anl.gov/mailman/listinfo/petsc-users > or, via email, send a message with subject or body 'help' to > petsc-users-request at mcs.anl.gov > > You can reach the person managing the list at > petsc-users-owner at mcs.anl.gov > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of petsc-users digest..." > > > Today's Topics: > > 1. Better way to pre-allocate memory for matrix being read in > ??? (Gaurish Telang) > 2. Re: Better way to pre-allocate memory for matrix being read > in ??? (Jed Brown) > 3. Question on LOCDIR (Gaurish Telang) > 4. Re: petsc-users Digest, Vol 26, Issue 30 > (Kontsantinos Kontzialis) > 5. Re: petsc-users Digest, Vol 26, Issue 30 (Matthew Knepley) > 6. Re: petsc-users Digest, Vol 26, Issue 30 > (Kontsantinos Kontzialis) > 7. Re: Question on LOCDIR (Barry Smith) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 13 Feb 2011 17:37:11 -0500 > From: Gaurish Telang > Subject: [petsc-users] Better way to pre-allocate memory for matrix > being read in ??? > To: petsc-users at mcs.anl.gov > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > > I have a text file containing the non-zero entries of a sparse matrix of > dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j, > element] format. > > That is ALL the information I have regarding the matrix. > > However when pre-allocating memory with MatSeqAIJSetPreallocation(Mat > B,PetscInt nz,const PetscInt nnz[]), the parameters nz and nnz need to be > known, > > nz=number of nonzeros per row (same for all rows) > nnz=array containing the number of nonzeros in the various rows (possibly > different for each row) or > PETSC_NULL > > which i do not know for my matrix, ( unless I resort to using MATLAB. ). > > Does that mean I have to set nz= 1274 (the length of a row) and > nnz=PETSC_NULL ? Though, I guess this setting would consume a lot of > memory for higher order matrices. > > How then, should I go about memory pre-allocation more efficiently? > > Thanks, > Gaurish Telang > > > > > > > > > > There is a code in the PETSc folder (/src/mat/examples/tests/ex78.c ) which > reads in a matrix of this format. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Sun, 13 Feb 2011 23:47:46 +0100 > From: Jed Brown > Subject: Re: [petsc-users] Better way to pre-allocate memory for > matrix being read in ??? > To: PETSc users list > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > On Sun, Feb 13, 2011 at 23:37, Gaurish Telang wrote: > >> I have a text file containing the non-zero entries of a sparse matrix of >> dimension 2683x1274, stored in the form (row, column, element) i.e. [i, j, >> element] format. >> > This is a horrible format and can not scale. If it becomes a performance > issue, change the format. > > >> That is ALL the information I have regarding the matrix. >> >> However when pre-allocating memory with MatSeqAIJSetPreallocation(Mat >> B,PetscInt nz,const PetscInt nnz[]), the parameters nz and nnz need to be >> known, >> >> nz=number of nonzeros per row (same for all rows) >> nnz=array containing the number of nonzeros in the various rows (possibly >> different for each row) or PETSC_NULL >> >> which i do not know for my matrix, ( unless I resort to using MATLAB. ). >> >> Does that mean I have to set nz= 1274 (the length of a row) and >> nnz=PETSC_NULL ? >> > No, read the file twice. The first time through, just count the number of > nonzeros (per row), then set preallocation with the correct size, and > finally read the file a second time calling MatSetValue() for each entry. > > For a matrix this small, you could just read it in without preallocating, > but that will get too expensive quickly if you increase the matrix size. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Sun, 13 Feb 2011 19:16:47 -0500 > From: Gaurish Telang > Subject: [petsc-users] Question on LOCDIR > To: petsc-users at mcs.anl.gov > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > > I notice that in the tutorial codes in $PETSC_DIR , the makefiles have the > LOCDIR variable defined at the top, to be the current working directory. > Whereas in the makefile chapter of the manual, > > the given template makefile has no mention of the LOCDIR variable. So, what > is the significance of this variable? > > My PETSc programs seem to compile fine without introducing it into my > makefile. > > Gaurish > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 4 > Date: Mon, 14 Feb 2011 04:12:13 +0200 > From: Kontsantinos Kontzialis > Subject: Re: [petsc-users] petsc-users Digest, Vol 26, Issue 30 > To: petsc-users at mcs.anl.gov > Message-ID:<4D588F7D.2060209 at lycos.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 02/13/2011 08:00 PM, petsc-users-request at mcs.anl.gov wrote: >> Send petsc-users mailing list submissions to >> petsc-users at mcs.anl.gov >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.mcs.anl.gov/mailman/listinfo/petsc-users >> or, via email, send a message with subject or body 'help' to >> petsc-users-request at mcs.anl.gov >> >> You can reach the person managing the list at >> petsc-users-owner at mcs.anl.gov >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of petsc-users digest..." >> >> >> Today's Topics: >> >> 1. Re: Guidelines for solving the euler equations with an >> implicit matrix free approach (Matthew Knepley) >> 2. Reading vtk file in parallel and assigning to an array >> (khalid ashraf) >> 3. Re: Reading vtk file in parallel and assigning to an array >> (Matthew Knepley) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sat, 12 Feb 2011 16:12:36 -0600 >> From: Matthew Knepley >> Subject: Re: [petsc-users] Guidelines for solving the euler equations >> with an implicit matrix free approach >> To: PETSc users list >> Message-ID: >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis< >> ckontzialis at lycos.com> wrote: >> >>> Dear Petsc team, >>> >>> I'm new in Petsc and I'm trying to solve the euler equations of fluid >>> dynamics >>> using an implicit matrix free approach with a spatial discontinues galerkin >>> discretization. >>> I need some directions about how can I solve the following system: >>> >>> (M/dt+dR/du)*DU=R >>> >>> where R denotes the residual of the system and dR/du the residual jacobian. >>> Please help. >>> >> Petsc provides linear algebra and nonlinear solvers. This is fine once you >> have discretized. >> It sounds like you will use DG: >> >> a) on a structured or unstructured grid? >> >> The PETSc DA supports structured grids in any dimension. After this, you >> want to use the >> TS object to present your system. There are many examples in the TS, e.g. >> ex10 for >> radiation-diffusion or ex14 for hydrostatic ice flow. >> >> Once you have your problem producing the correct residual and Jacobian, we >> can talk >> about solvers. >> >> Matt >> >> >>> Kostas >>> > Mat, > > Thank you for your reply. I have done the discretization and the > residual and jacobian are computes correctly. Furthermore, I managed to > do some calculation using TS but with an explicit scheme. I need to work > with an implicit time discretization and I have read in a quite few > papers that they follow the matrix free approach. > > Thank you, > > Kostas > > > ------------------------------ > > Message: 5 > Date: Sun, 13 Feb 2011 20:18:42 -0600 > From: Matthew Knepley > Subject: Re: [petsc-users] petsc-users Digest, Vol 26, Issue 30 > To: PETSc users list > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Sun, Feb 13, 2011 at 8:12 PM, Kontsantinos Kontzialis< > ckontzialis at lycos.com> wrote: > >> Message: 1 >>> Date: Sat, 12 Feb 2011 16:12:36 -0600 >>> From: Matthew Knepley >>> Subject: Re: [petsc-users] Guidelines for solving the euler equations >>> with an implicit matrix free approach >>> To: PETSc users list >>> Message-ID: >>> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> On Fri, Feb 11, 2011 at 10:37 AM, Kontsantinos Kontzialis< >>> ckontzialis at lycos.com> wrote: >>> >>> Dear Petsc team, >>>> I'm new in Petsc and I'm trying to solve the euler equations of fluid >>>> dynamics >>>> using an implicit matrix free approach with a spatial discontinues >>>> galerkin >>>> discretization. >>>> I need some directions about how can I solve the following system: >>>> >>>> (M/dt+dR/du)*DU=R >>>> >>>> where R denotes the residual of the system and dR/du the residual >>>> jacobian. >>>> Please help. >>>> >>>> Petsc provides linear algebra and nonlinear solvers. This is fine once >>> you >>> have discretized. >>> It sounds like you will use DG: >>> >>> a) on a structured or unstructured grid? >>> >>> The PETSc DA supports structured grids in any dimension. After this, you >>> want to use the >>> TS object to present your system. There are many examples in the TS, e.g. >>> ex10 for >>> radiation-diffusion or ex14 for hydrostatic ice flow. >>> >>> Once you have your problem producing the correct residual and Jacobian, we >>> can talk >>> about solvers. >>> >>> Matt >>> >>> >>> Kostas >>>> Mat, >> Thank you for your reply. I have done the discretization and the residual >> and jacobian are computes correctly. Furthermore, I managed to do some >> calculation using TS but with an explicit scheme. I need to work with an >> implicit time discretization and I have read in a quite few papers that they >> follow the matrix free approach. >> > Once you plug your Residual and Jacobian into the TS, you can start to try > out different solvers. Is this working? > > Thanks, > > Matt > > >> Thank you, >> >> Kostas >> > > Matt, here is a fragment from my code where I try to work on the implicit scheme // Apply initial conditions ierr = initial_conditions(sys); CHKERRQ(ierr); ierr = TSCreate(sys.comm, &sys.ts); CHKERRQ(ierr); ierr = TSSetSolution(sys.ts, sys.gsv); CHKERRQ(ierr); ierr = TSSetFromOptions(sys.ts); CHKERRQ(ierr); ierr = TSSetProblemType(sys.ts, TS_NONLINEAR); CHKERRQ(ierr); ierr = TSSetType(sys.ts, TSBEULER); CHKERRQ(ierr); ierr = TSSetRHSFunction(sys.ts, base_residual, &sys); CHKERRQ(ierr); ierr = TSGetSNES(sys.ts, &sys.snes); CHKERRQ(ierr); ierr = SNESSetFromOptions(sys.snes); CHKERRQ(ierr); ierr = MatCreateSNESMF(sys.snes, &sys.J); CHKERRQ(ierr); ierr = TSSetRHSJacobian(sys.ts, sys.J, sys.J, jacobian_matrix, &sys); CHKERRQ(ierr); ierr = SNESGetKSP(sys.snes, &sys.ksp); CHKERRQ(ierr) ierr = MatScale(sys.M, 1.0 / sys.con->dt); CHKERRQ(ierr); ierr = MatAYPX(sys.J, -1.0, sys.M, DIFFERENT_NONZERO_PATTERN); CHKERRQ(ierr); ierr = KSPSetOperators(sys.ksp, sys.J, sys.J, SAME_NONZERO_PATTERN); CHKERRQ(ierr); sys.con->j = 0; sys.con->tm = 0; ierr = TSSetDuration(sys.ts, 10000, sys.con->etime); CHKERRQ(ierr); ierr = TSMonitorSet(sys.ts, monitor, &sys, PETSC_NULL); CHKERRQ(ierr); ierr = PetscMalloc (sys.ldof*sizeof (PetscScalar ),&sys.Lim); CHKERRQ(ierr); ierr = TSSetSolution(sys.ts, PETSC_NULL); CHKERRQ(ierr); sys is my application context. Petsc tells me that I should call first the snessetfunction due to MatCreateSNESMF, 'cause I want to use an MF approach. what can I do? on an explicit run I do not use the jacobian, but how can I use it there now with a MF? Im confused. Thank you, kostas From bsmith at mcs.anl.gov Sun Feb 13 21:45:47 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 13 Feb 2011 21:45:47 -0600 Subject: [petsc-users] petsc-users Digest, Vol 26, Issue 31 In-Reply-To: <4D589CFF.7000703@lycos.com> References: <4D589CFF.7000703@lycos.com> Message-ID: On Feb 13, 2011, at 9:09 PM, Kontsantinos Kontzialis wrote: Don't get the snes and ksp and do kspsetoperators() instead use TSSetRHSJacobian() Do not use MatCreateSNESMF() that is for working with the nonlinear solver use MatCreateMFFD() you then need to call MatMFFDSetFunction() to give it the function it will provide the derivative of. You need to pass a function to TSSetRHSJacobian() that sets the MatMFFDSetBase() and then calls MatAssemblyBegin/End() Barry > > here is a fragment from my code where I try to work on the implicit scheme > > // Apply initial conditions > ierr = initial_conditions(sys); > CHKERRQ(ierr); > > ierr = TSCreate(sys.comm, &sys.ts); > CHKERRQ(ierr); > > ierr = TSSetSolution(sys.ts, sys.gsv); > CHKERRQ(ierr); > > ierr = TSSetFromOptions(sys.ts); > CHKERRQ(ierr); > > ierr = TSSetProblemType(sys.ts, TS_NONLINEAR); > CHKERRQ(ierr); > > ierr = TSSetType(sys.ts, TSBEULER); > CHKERRQ(ierr); > > ierr = TSSetRHSFunction(sys.ts, base_residual, &sys); > CHKERRQ(ierr); > > ierr = TSGetSNES(sys.ts, &sys.snes); > CHKERRQ(ierr); > > ierr = SNESSetFromOptions(sys.snes); > CHKERRQ(ierr); > > ierr = MatCreateSNESMF(sys.snes, &sys.J); > CHKERRQ(ierr); > > ierr = TSSetRHSJacobian(sys.ts, sys.J, sys.J, jacobian_matrix, &sys); > CHKERRQ(ierr); > > ierr = SNESGetKSP(sys.snes, &sys.ksp); > CHKERRQ(ierr) > > ierr = MatScale(sys.M, 1.0 / sys.con->dt); > CHKERRQ(ierr); > > ierr = MatAYPX(sys.J, -1.0, sys.M, DIFFERENT_NONZERO_PATTERN); > CHKERRQ(ierr); > > ierr = KSPSetOperators(sys.ksp, sys.J, sys.J, SAME_NONZERO_PATTERN); > CHKERRQ(ierr); > > sys.con->j = 0; > sys.con->tm = 0; > > ierr = TSSetDuration(sys.ts, 10000, sys.con->etime); > CHKERRQ(ierr); > > ierr = TSMonitorSet(sys.ts, monitor, &sys, PETSC_NULL); > CHKERRQ(ierr); > > ierr = PetscMalloc (sys.ldof*sizeof (PetscScalar ),&sys.Lim); > CHKERRQ(ierr); > > ierr = TSSetSolution(sys.ts, PETSC_NULL); > CHKERRQ(ierr); > > sys is my application context. Petsc tells me that I should call first the snessetfunction due to > MatCreateSNESMF, 'cause I want to use an MF approach. what can I do? on an explicit run I do > not use the jacobian, but how can I use it there now with a MF? Im confused. > > Thank you, > > kostas From tomjan at jay.au.poznan.pl Mon Feb 14 03:13:51 2011 From: tomjan at jay.au.poznan.pl (Tomasz Jankowski) Date: Mon, 14 Feb 2011 10:13:51 +0100 (CET) Subject: [petsc-users] query about parallel REML Message-ID: Hello All, I'm looking for some opensource/free copy of parallel reml (best based on PETSC). I have found old post at https://stat.ethz.ch/pipermail/r-help/2004-May/050436.html which is directing to acre-developers at eml.pnl.gov. But it seems that this email is not active. I have also try with JM.Malard at pnl.gov email but it's also not active. So I'm writing here. Does anyone have copy of this software? Could you share it? Many Thanks, Tomasz Jankowski ######################################################## # tomjan at jay.au.poznan.pl # # jay.au.poznan.pl/~tomjan/ # ######################################################## From fernandez858 at gmail.com Mon Feb 14 04:27:51 2011 From: fernandez858 at gmail.com (Michel Cancelliere) Date: Mon, 14 Feb 2011 11:27:51 +0100 Subject: [petsc-users] Solver Parameter Optimization Message-ID: Dear users, I've implemented a simple hydrocarbon reservoir simulator using PETSc, the simulator is used inside an iterative loop in which thousand of simulations are run with different input parameters(In order to calibrate the properties of the model). I would like to use those iterations to tuneup the parameters of the solver (precoditioner,type of linear solver, restart, etc...), Have someone working with that?, Do you know some papers where I can some information about that? Thank you for your time, Michel -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron.ahmadia at kaust.edu.sa Mon Feb 14 04:35:50 2011 From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia) Date: Mon, 14 Feb 2011 13:35:50 +0300 Subject: [petsc-users] Solver Parameter Optimization In-Reply-To: References: Message-ID: I've seen a few threads in this direction: See Sanjukta Bhowmick's work on combining machine learning with PETSc to start: http://cs.unomaha.edu/~bhowmick/Blog/Entries/2010/9/12_Solvers_for_Large_Sparse_Linear_Systems.html HYPRE has something along the lines of this as well, but I have not seen any promising results. Don't forget that even slightly different problems can have wildly different convergence properties, you want a solver that is both fast and robust to changes in your input parameters. A On Mon, Feb 14, 2011 at 1:27 PM, Michel Cancelliere wrote: > Dear users, > > I've implemented a simple hydrocarbon reservoir simulator using PETSc, the > simulator is used inside an iterative loop in which thousand of simulations > are run with different input parameters(In order to calibrate the properties > of the model). I would like to use those iterations to tuneup the parameters > of the solver (precoditioner,type of linear solver, restart, etc...), Have > someone working with that?, Do you know some papers where I can some > information about that? > > Thank you for your time, > > Michel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fernandez858 at gmail.com Mon Feb 14 04:40:19 2011 From: fernandez858 at gmail.com (Michel Cancelliere) Date: Mon, 14 Feb 2011 11:40:19 +0100 Subject: [petsc-users] Solver Parameter Optimization In-Reply-To: References: Message-ID: Thank you Aron, I'll check it On Mon, Feb 14, 2011 at 11:35 AM, Aron Ahmadia wrote: > I've seen a few threads in this direction: > > See Sanjukta Bhowmick's work on combining machine learning with PETSc to > start: > http://cs.unomaha.edu/~bhowmick/Blog/Entries/2010/9/12_Solvers_for_Large_Sparse_Linear_Systems.html > > HYPRE has something along the lines of this as well, but I have not seen > any promising results. > > Don't forget that even slightly different problems can have wildly > different convergence properties, you want a solver that is both fast and > robust to changes in your input parameters. > > A > > > On Mon, Feb 14, 2011 at 1:27 PM, Michel Cancelliere < > fernandez858 at gmail.com> wrote: > >> Dear users, >> >> I've implemented a simple hydrocarbon reservoir simulator using PETSc, the >> simulator is used inside an iterative loop in which thousand of simulations >> are run with different input parameters(In order to calibrate the properties >> of the model). I would like to use those iterations to tuneup the parameters >> of the solver (precoditioner,type of linear solver, restart, etc...), Have >> someone working with that?, Do you know some papers where I can some >> information about that? >> >> Thank you for your time, >> >> Michel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurish108 at gmail.com Mon Feb 14 11:01:07 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Mon, 14 Feb 2011 12:01:07 -0500 Subject: [petsc-users] Trouble understanding -vec_view output. Message-ID: Hi, I am having trouble understanding the -vec_view output of the simple code I have pasted underneath. In it, I am just reading two PetscBinary files created with a stand alone code. one containing a matrix and another containing a vector. However on doing -vec_view during run-time, I get a sequence of zeros before the actual vector of the binary file is printed. But when I read the PetscBinary file in MATLAB I get the correct vector. Why does this happen? Is it because I am using vector type VECMPI to load the binary file (* ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr); *)?? e.g. The ASCII text file (BEFORE converting to binary) with standalone code looks like 4 6 The output I get on -vec_view is 0 0 4 6 But with VecGetSize I get the vector length to be 2. !!! Thank you, Gaurish %--------------------------------------------- Code: int main(int argc,char **args) { Mat A ; Vec b ; PetscTruth flg_A,flg_b ; PetscErrorCode ierr ; PetscInt m,n,length ; char file_A[PETSC_MAX_PATH_LEN],file_b[PETSC_MAX_PATH_LEN] ; PetscViewer fd_A, fd_b ; PetscInitialize(&argc,&args,(char *)0,help); /* Get the option typed from the terminal */ ierr = PetscOptionsGetString(PETSC_NULL,"-matrix",file_A,PETSC_MAX_PATH_LEN-1,&flg_A);CHKERRQ(ierr); if (!flg_A) SETERRQ(1,"Must indicate binary matrix matrix file with the -matrix option"); ierr = PetscOptionsGetString(PETSC_NULL,"-vector",file_b,PETSC_MAX_PATH_LEN-1,&flg_b);CHKERRQ(ierr); if (!flg_b) SETERRQ(1,"Must indicate binary matrix matrix file with the -vector option"); /* Load the matrix and vector */ ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_A,FILE_MODE_READ,&fd_A);CHKERRQ(ierr); ierr = MatLoad(fd_A,MATMPIAIJ,&A);CHKERRQ(ierr); ierr = PetscViewerDestroy(fd_A);CHKERRQ(ierr); //ierr=MatView(A,PETSC_VIEWER_DRAW_WORLD);CHKERRQ(ierr); ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_b,FILE_MODE_READ,&fd_b);CHKERRQ(ierr); ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr); ierr = PetscViewerDestroy(fd_b);CHKERRQ(ierr); /* Simple Cursory checks */ ierr = MatGetSize(A,&m,&n);CHKERRQ(ierr); ierr=PetscPrintf(PETSC_COMM_WORLD,"\n %i %i \n",m,n);CHKERRQ(ierr); ierr=VecGetSize(b,&length);CHKERRQ(ierr); ierr=PetscPrintf(PETSC_COMM_WORLD,"%i \n",length);CHKERRQ(ierr); //ierr=VecView(b,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); /* Destroy Objects. */ MatDestroy(A); VecDestroy(b); ierr = PetscFinalize();CHKERRQ(ierr); sleep(4); return 0; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 14 11:11:23 2011 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 14 Feb 2011 11:11:23 -0600 Subject: [petsc-users] Trouble understanding -vec_view output. In-Reply-To: References: Message-ID: On Mon, Feb 14, 2011 at 11:01 AM, Gaurish Telang wrote: > Hi, > > I am having trouble understanding the -vec_view output of the simple code I > have pasted underneath. In it, I am just reading two PetscBinary files > created with a stand alone code. one containing a matrix and another > containing a vector. > > However on doing -vec_view during run-time, I get a sequence of zeros > before the actual vector of the binary file is printed. But when I read the > PetscBinary file in MATLAB I get the correct vector. > > Why does this happen? Is it because I am using vector type VECMPI to load > the binary file (* ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr); *)?? > > e.g. > > The ASCII text file (BEFORE converting to binary) with standalone code > looks like > > 4 > 6 > > The output I get on -vec_view is > > 0 > 0 > 4 > 6 > > But with VecGetSize I get the vector length to be 2. !!! > Send the entire output and input files to petsc-maint at mcs.anl.gov. I guarantee you that this is just misunderstanding, but its impossible to see exactly what you are doing from this sample. For instance, in parallel -vec_view will print Process [k]. Matt > Thank you, > > Gaurish > > > %--------------------------------------------- > Code: > int main(int argc,char **args) > { > Mat A ; > Vec b ; > PetscTruth flg_A,flg_b ; > PetscErrorCode ierr ; > PetscInt m,n,length ; > char > file_A[PETSC_MAX_PATH_LEN],file_b[PETSC_MAX_PATH_LEN] ; > PetscViewer fd_A, fd_b ; > > PetscInitialize(&argc,&args,(char *)0,help); > > /* Get the option typed from the terminal */ > ierr = > PetscOptionsGetString(PETSC_NULL,"-matrix",file_A,PETSC_MAX_PATH_LEN-1,&flg_A);CHKERRQ(ierr); > if (!flg_A) SETERRQ(1,"Must indicate binary matrix matrix file with the > -matrix option"); > > ierr = > PetscOptionsGetString(PETSC_NULL,"-vector",file_b,PETSC_MAX_PATH_LEN-1,&flg_b);CHKERRQ(ierr); > if (!flg_b) SETERRQ(1,"Must indicate binary matrix matrix file with the > -vector option"); > > /* Load the matrix and vector */ > ierr = > PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_A,FILE_MODE_READ,&fd_A);CHKERRQ(ierr); > ierr = MatLoad(fd_A,MATMPIAIJ,&A);CHKERRQ(ierr); > ierr = PetscViewerDestroy(fd_A);CHKERRQ(ierr); > > //ierr=MatView(A,PETSC_VIEWER_DRAW_WORLD);CHKERRQ(ierr); > > > ierr = > PetscViewerBinaryOpen(PETSC_COMM_WORLD,file_b,FILE_MODE_READ,&fd_b);CHKERRQ(ierr); > ierr = VecLoad(fd_b,VECMPI,&b);CHKERRQ(ierr); > ierr = PetscViewerDestroy(fd_b);CHKERRQ(ierr); > > /* Simple Cursory checks */ > ierr = MatGetSize(A,&m,&n);CHKERRQ(ierr); > ierr=PetscPrintf(PETSC_COMM_WORLD,"\n %i %i \n",m,n);CHKERRQ(ierr); > > ierr=VecGetSize(b,&length);CHKERRQ(ierr); > ierr=PetscPrintf(PETSC_COMM_WORLD,"%i \n",length);CHKERRQ(ierr); > //ierr=VecView(b,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > > /* Destroy Objects. */ > MatDestroy(A); > VecDestroy(b); > > ierr = PetscFinalize();CHKERRQ(ierr); > > sleep(4); > > return 0; > } > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Mon Feb 14 11:13:29 2011 From: jed at 59A2.org (Jed Brown) Date: Mon, 14 Feb 2011 18:13:29 +0100 Subject: [petsc-users] Trouble understanding -vec_view output. In-Reply-To: References: Message-ID: On Mon, Feb 14, 2011 at 18:01, Gaurish Telang wrote: > 0 > 0 > 4 > 6 > > But with VecGetSize I get the vector length to be 2. !!! > Looks like the vector is printed twice, once when it was created and once after meaningful values were actually loaded. Does this still happen with the different loading model in petsc-dev? -------------- next part -------------- An HTML attachment was scrubbed... URL: From w_subber at yahoo.com Tue Feb 15 14:53:03 2011 From: w_subber at yahoo.com (Waad Subber) Date: Tue, 15 Feb 2011 12:53:03 -0800 (PST) Subject: [petsc-users] Create a matrix from a set of vectors Message-ID: <911695.8147.qm@web38204.mail.mud.yahoo.com> Hello, Can I create a matrix from a set of vectors without using VecGetValues and MatSetValues such as Vec? v1, v2 Mat? A A=[v1 v2] Thanks Waad -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 15 15:12:59 2011 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Feb 2011 15:12:59 -0600 Subject: [petsc-users] Create a matrix from a set of vectors In-Reply-To: <911695.8147.qm@web38204.mail.mud.yahoo.com> References: <911695.8147.qm@web38204.mail.mud.yahoo.com> Message-ID: On Tue, Feb 15, 2011 at 2:53 PM, Waad Subber wrote: > Hello, > Can I create a matrix from a set of vectors without using VecGetValues and > MatSetValues such as > > Vec v1, v2 > Mat A > > A=[v1 v2] > This is a dense matrix. You can create a MatDense and pull out arrays to shared. Matt > Thanks > Waad > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bjha7333 at yahoo.com Tue Feb 15 18:56:26 2011 From: bjha7333 at yahoo.com (Birendra jha) Date: Tue, 15 Feb 2011 16:56:26 -0800 (PST) Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf Message-ID: <903282.93478.qm@web120515.mail.ne1.yahoo.com> Dear Petsc users, I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test Running test examples to verify correct installation --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl /usr/bin/ld: cannot find -licudata collect2: ld returned 1 exit status make[3]: [ex19] Error 1 (ignored) /bin/rm -f ex19.o --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl /usr/bin/ld: cannot find -licudata collect2: ld returned 1 exit status make[3]: [ex5f] Error 1 (ignored) /bin/rm -f ex5f.o Completed test examples It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues for some time now. So it should be that, I suppose. I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test: PetscMatlabEngine e; PetscScalar *array; array[0]=0; const char name[]="a"; PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); PetscMatlabEnginePutArray(e,1,1,array,name); PetscMatlabEngineGetArray(e,1,1,array,name); PetscMatlabEngineDestroy(e); Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file? Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine. But, I get runtime error for mexPrintf: bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg Traceback (most recent call last): File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in from pylith.apps.PyLithApp import PyLithApp File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in from PetscApplication import PetscApplication File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in class PetscApplication(Application): File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication from pylith.utils.PetscManager import PetscManager File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in import pylith.utils.petsc as petsc File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in _petsc = swig_import_helper() File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper _mod = imp.load_module('_petsc', fp, pathname, description) ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf Can anyone help/suggest something? Thanks a lot Bir From bsmith at mcs.anl.gov Tue Feb 15 19:56:02 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 15 Feb 2011 19:56:02 -0600 Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <903282.93478.qm@web120515.mail.ne1.yahoo.com> References: <903282.93478.qm@web120515.mail.ne1.yahoo.com> Message-ID: <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: > Dear Petsc users, > > I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output also run file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 > > bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test > Running test examples to verify correct installation > --------------Error detected during compile or link!----------------------- > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c > mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl > /usr/bin/ld: cannot find -licudata > collect2: ld returned 1 exit status > make[3]: [ex19] Error 1 (ignored) > /bin/rm -f ex19.o > --------------Error detected during compile or link!----------------------- > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F > mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl > /usr/bin/ld: cannot find -licudata > collect2: ld returned 1 exit status > make[3]: [ex5f] Error 1 (ignored) > /bin/rm -f ex5f.o > Completed test examples > > > It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues > for some time now. So it should be that, I suppose. > > I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test: > > PetscMatlabEngine e; > PetscScalar *array; array[0]=0; > const char name[]="a"; > PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); > PetscMatlabEnginePutArray(e,1,1,array,name); > PetscMatlabEngineGetArray(e,1,1,array,name); > PetscMatlabEngineDestroy(e); > > Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file? No > Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine. > > But, I get runtime error for mexPrintf: > > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg > Traceback (most recent call last): > File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in > from pylith.apps.PyLithApp import PyLithApp > File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in > from PetscApplication import PetscApplication > File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in > class PetscApplication(Application): > File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication > from pylith.utils.PetscManager import PetscManager > File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in > import pylith.utils.petsc as petsc > File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in > _petsc = swig_import_helper() > File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper > _mod = imp.load_module('_petsc', fp, pathname, description) > ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded. I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols? Barry > > > Can anyone help/suggest something? > > Thanks a lot > Bir > > > From bjha7333 at yahoo.com Tue Feb 15 20:14:11 2011 From: bjha7333 at yahoo.com (Birendra jha) Date: Tue, 15 Feb 2011 18:14:11 -0800 (PST) Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf Message-ID: <787767.48972.qm@web120505.mail.ne1.yahoo.com> Hi, I attached the output of ls -l. Below are the outputs of "file" command: bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42: symbolic link to `libicudata.so.42.1' bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1 /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped Thanks Bir --- On Wed, 2/16/11, Barry Smith wrote: > From: Barry Smith > Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf > To: "PETSc users list" > Date: Wednesday, February 16, 2011, 7:26 AM > > On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: > > > Dear Petsc users, > > > > I am getting "cannot find -licudata" error during > "make test" on petsc-dev, even when libicudata.so.42.1, and > its link, libicudata.so.42 are in > /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. > > ? Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and > send the output > also run file? > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 > > > > > bjha at ubuntu:~/src/petsc-dev$ make > PETSC_DIR=/home/bjha/src/petsc-dev > PETSC_ARCH=linux_gcc-4.4.1_64 test > > Running test examples to verify correct installation > > --------------Error detected during compile or > link!----------------------- > > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > > mpicxx -o ex19.o -c -Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -g > -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 > -I/home/bjha/src/petsc-dev/include > -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include > -I/home/bjha/src/petsc-dev/include/sieve > -I/home/bjha/MATLAB/R2010b/extern/include > -I/home/bjha/tools/gcc-4.4.1_64/include > -D__INSDIR__=src/snes/examples/tutorials/ ex19.c > > mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -g???-o ex19? > ex19.o > -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib? > -lpetsc > -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib > -lparmetis -lmetis > -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 > -L/home/bjha/MATLAB/R2010b/bin/glnx86 > -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex > -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco > -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas > -L/home/bjha/tools/gcc-4.4.1_64/lib > -L/usr/lib/gcc/i486-linux-gnu/4.4.3 > -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal > -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 > -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx > -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil > -lgcc_s -lpthread -ldl > > /usr/bin/ld: cannot find -licudata > > collect2: ld returned 1 exit status > > make[3]: [ex19] Error 1 (ignored) > > /bin/rm -f ex19.o > > --------------Error detected during compile or > link!----------------------- > > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > > mpif90 -c? -Wall -Wno-unused-variable > -g???-I/home/bjha/src/petsc-dev/include > -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include > -I/home/bjha/src/petsc-dev/include/sieve > -I/home/bjha/MATLAB/R2010b/extern/include > -I/home/bjha/tools/gcc-4.4.1_64/include? ? -o > ex5f.o ex5f.F > > mpif90 -Wall -Wno-unused-variable > -g???-o ex5f ex5f.o > -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib? > -lpetsc > -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib > -lparmetis -lmetis > -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 > -L/home/bjha/MATLAB/R2010b/bin/glnx86 > -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex > -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco > -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas > -L/home/bjha/tools/gcc-4.4.1_64/lib > -L/usr/lib/gcc/i486-linux-gnu/4.4.3 > -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal > -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 > -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx > -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil > -lgcc_s -lpthread -ldl > > /usr/bin/ld: cannot find -licudata > > collect2: ld returned 1 exit status > > make[3]: [ex5f] Error 1 (ignored) > > /bin/rm -f ex5f.o > > Completed test examples > > > > > > It is correct that I didn't install matlab in the > default directory (/usr/local) because I had some permission > issues on Ubuntu. But I have been running matlab (by running > /MATLAB/R2010b/bin/matlab.sh) without any issues > > for some time now. So it should be that, I suppose. > > > > I still went ahead with compiling my application with > few lines of PetscMatlabEngine functions, just to test: > > > >? PetscMatlabEngine e; > >? PetscScalar *array; array[0]=0; > >? const char name[]="a"; > >? > PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); > >? PetscMatlabEnginePutArray(e,1,1,array,name); > >? PetscMatlabEngineGetArray(e,1,1,array,name); > >? PetscMatlabEngineDestroy(e); > > > > Do I need to include any header file (e.g. > petscmatlab.h) in the header of my class file? > > ? No > > > Right now, the application compiled (make, make > install) fine without any such include file. The application > have been using Petsc for its solver without any issues, so > it includes all the necessary files. I just want to exten > the application to call some matlab scripts by using > PetscMatlabEngine. > > > > But, I get runtime error for mexPrintf: > > > > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith > step06_pres.cfg > > Traceback (most recent call last): > >? File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", > line 37, in > >? ? from pylith.apps.PyLithApp import > PyLithApp > >? File > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", > line 23, in > >? ? from PetscApplication import > PetscApplication > >? File > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", > line 27, in > >? ? class PetscApplication(Application): > >? File > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", > line 41, in PetscApplication > >? ? from pylith.utils.PetscManager import > PetscManager > >? File > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", > line 29, in > >? ? import pylith.utils.petsc as petsc > >? File > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", > line 25, in > >? ? _petsc = swig_import_helper() > >? File > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", > line 21, in swig_import_helper > >? ? _mod = imp.load_module('_petsc', fp, > pathname, description) > > ImportError: > /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined > symbol: mexPrintf > > ? Somehow all the Matlab shared libraries need to be > found when python loads libpylith.so is loaded.? I > don't know how this is done in Linux. It is really a python > question if you want to use a dynamic library in python that > uses another shared library how do you make sure python gets > all the shared libraries loaded to resolve the symbols? > > ???Barry > > > > > > > Can anyone help/suggest something? > > > > Thanks a lot > > Bir > > > > > > > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: MATLAB_R2010b_bin_glnx86_files.txt URL: From bsmith at mcs.anl.gov Tue Feb 15 20:25:11 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 15 Feb 2011 20:25:11 -0600 Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <787767.48972.qm@web120505.mail.ne1.yahoo.com> References: <787767.48972.qm@web120505.mail.ne1.yahoo.com> Message-ID: <426D3650-BA63-48A7-A8ED-09D9AB96D8E3@mcs.anl.gov> On Feb 15, 2011, at 8:14 PM, Birendra jha wrote: > Hi, > > I attached the output of ls -l. Below are the outputs of "file" command: > > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42: symbolic link to `libicudata.so.42.1' > > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1 > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped Humm. Run file on some of the other libraries in that directory. Are they also ELF 32 bit? You can try editing ${PETSC_ARCH/conf/petscvariables and removing the reference to libicudata then run make test. It may not be needed. Barry > > > Thanks > Bir > > --- On Wed, 2/16/11, Barry Smith wrote: > >> From: Barry Smith >> Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf >> To: "PETSc users list" >> Date: Wednesday, February 16, 2011, 7:26 AM >> >> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: >> >>> Dear Petsc users, >>> >>> I am getting "cannot find -licudata" error during >> "make test" on petsc-dev, even when libicudata.so.42.1, and >> its link, libicudata.so.42 are in >> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. >> >> Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and >> send the output >> also run file >> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 >> >>> >>> bjha at ubuntu:~/src/petsc-dev$ make >> PETSC_DIR=/home/bjha/src/petsc-dev >> PETSC_ARCH=linux_gcc-4.4.1_64 test >>> Running test examples to verify correct installation >>> --------------Error detected during compile or >> link!----------------------- >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>> mpicxx -o ex19.o -c -Wall -Wwrite-strings >> -Wno-strict-aliasing -Wno-unknown-pragmas -g >> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 >> -I/home/bjha/src/petsc-dev/include >> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include >> -I/home/bjha/src/petsc-dev/include/sieve >> -I/home/bjha/MATLAB/R2010b/extern/include >> -I/home/bjha/tools/gcc-4.4.1_64/include >> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c >>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing >> -Wno-unknown-pragmas -g -o ex19 >> ex19.o >> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >> -lpetsc >> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >> -lparmetis -lmetis >> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 >> -L/home/bjha/MATLAB/R2010b/bin/glnx86 >> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex >> -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco >> -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas >> -L/home/bjha/tools/gcc-4.4.1_64/lib >> -L/usr/lib/gcc/i486-linux-gnu/4.4.3 >> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 >> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx >> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil >> -lgcc_s -lpthread -ldl >>> /usr/bin/ld: cannot find -licudata >>> collect2: ld returned 1 exit status >>> make[3]: [ex19] Error 1 (ignored) >>> /bin/rm -f ex19.o >>> --------------Error detected during compile or >> link!----------------------- >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>> mpif90 -c -Wall -Wno-unused-variable >> -g -I/home/bjha/src/petsc-dev/include >> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include >> -I/home/bjha/src/petsc-dev/include/sieve >> -I/home/bjha/MATLAB/R2010b/extern/include >> -I/home/bjha/tools/gcc-4.4.1_64/include -o >> ex5f.o ex5f.F >>> mpif90 -Wall -Wno-unused-variable >> -g -o ex5f ex5f.o >> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >> -lpetsc >> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >> -lparmetis -lmetis >> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 >> -L/home/bjha/MATLAB/R2010b/bin/glnx86 >> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex >> -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco >> -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas >> -L/home/bjha/tools/gcc-4.4.1_64/lib >> -L/usr/lib/gcc/i486-linux-gnu/4.4.3 >> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 >> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx >> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil >> -lgcc_s -lpthread -ldl >>> /usr/bin/ld: cannot find -licudata >>> collect2: ld returned 1 exit status >>> make[3]: [ex5f] Error 1 (ignored) >>> /bin/rm -f ex5f.o >>> Completed test examples >>> >>> >>> It is correct that I didn't install matlab in the >> default directory (/usr/local) because I had some permission >> issues on Ubuntu. But I have been running matlab (by running >> /MATLAB/R2010b/bin/matlab.sh) without any issues >>> for some time now. So it should be that, I suppose. >>> >>> I still went ahead with compiling my application with >> few lines of PetscMatlabEngine functions, just to test: >>> >>> PetscMatlabEngine e; >>> PetscScalar *array; array[0]=0; >>> const char name[]="a"; >>> >> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); >>> PetscMatlabEnginePutArray(e,1,1,array,name); >>> PetscMatlabEngineGetArray(e,1,1,array,name); >>> PetscMatlabEngineDestroy(e); >>> >>> Do I need to include any header file (e.g. >> petscmatlab.h) in the header of my class file? >> >> No >> >>> Right now, the application compiled (make, make >> install) fine without any such include file. The application >> have been using Petsc for its solver without any issues, so >> it includes all the necessary files. I just want to exten >> the application to call some matlab scripts by using >> PetscMatlabEngine. >>> >>> But, I get runtime error for mexPrintf: >>> >>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith >> step06_pres.cfg >>> Traceback (most recent call last): >>> File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", >> line 37, in >>> from pylith.apps.PyLithApp import >> PyLithApp >>> File >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", >> line 23, in >>> from PetscApplication import >> PetscApplication >>> File >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", >> line 27, in >>> class PetscApplication(Application): >>> File >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", >> line 41, in PetscApplication >>> from pylith.utils.PetscManager import >> PetscManager >>> File >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", >> line 29, in >>> import pylith.utils.petsc as petsc >>> File >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", >> line 25, in >>> _petsc = swig_import_helper() >>> File >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", >> line 21, in swig_import_helper >>> _mod = imp.load_module('_petsc', fp, >> pathname, description) >>> ImportError: >> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined >> symbol: mexPrintf >> >> Somehow all the Matlab shared libraries need to be >> found when python loads libpylith.so is loaded. I >> don't know how this is done in Linux. It is really a python >> question if you want to use a dynamic library in python that >> uses another shared library how do you make sure python gets >> all the shared libraries loaded to resolve the symbols? >> >> Barry >> >>> >>> >>> Can anyone help/suggest something? >>> >>> Thanks a lot >>> Bir >>> >>> >>> >> >> > > > From baagaard at usgs.gov Tue Feb 15 20:44:43 2011 From: baagaard at usgs.gov (Brad Aagaard) Date: Tue, 15 Feb 2011 18:44:43 -0800 Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov> References: <903282.93478.qm@web120515.mail.ne1.yahoo.com> <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov> Message-ID: <4D5B3A1B.3040607@usgs.gov> Birendra- PyLith extracts the link flags from PETSc during the PyLith configure, so any libraries PETSc uses should automatically get linked into the PyLith Python modules and libraries. Only if these libraries end up in some unusual PETSc make related variable would they be missed in the PyLith linking. You can also run "ldd libpylith.so" in the directory containing libpylith.so to make sure it is finding the shared libraries. Brad On 2/15/11 5:56 PM, Barry Smith wrote: > > On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: > >> Dear Petsc users, >> >> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. > > Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output > also run file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 > >> >> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test >> Running test examples to verify correct installation >> --------------Error detected during compile or link!----------------------- >> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c >> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >> /usr/bin/ld: cannot find -licudata >> collect2: ld returned 1 exit status >> make[3]: [ex19] Error 1 (ignored) >> /bin/rm -f ex19.o >> --------------Error detected during compile or link!----------------------- >> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >> mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F >> mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >> /usr/bin/ld: cannot find -licudata >> collect2: ld returned 1 exit status >> make[3]: [ex5f] Error 1 (ignored) >> /bin/rm -f ex5f.o >> Completed test examples >> >> >> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues >> for some time now. So it should be that, I suppose. >> >> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test: >> >> PetscMatlabEngine e; >> PetscScalar *array; array[0]=0; >> const char name[]="a"; >> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); >> PetscMatlabEnginePutArray(e,1,1,array,name); >> PetscMatlabEngineGetArray(e,1,1,array,name); >> PetscMatlabEngineDestroy(e); >> >> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file? > > No > >> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine. >> >> But, I get runtime error for mexPrintf: >> >> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg >> Traceback (most recent call last): >> File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in >> from pylith.apps.PyLithApp import PyLithApp >> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in >> from PetscApplication import PetscApplication >> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in >> class PetscApplication(Application): >> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication >> from pylith.utils.PetscManager import PetscManager >> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in >> import pylith.utils.petsc as petsc >> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in >> _petsc = swig_import_helper() >> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper >> _mod = imp.load_module('_petsc', fp, pathname, description) >> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf > > Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded. I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols? > > Barry > >> >> >> Can anyone help/suggest something? >> >> Thanks a lot >> Bir >> >> >> > > From bsmith at mcs.anl.gov Tue Feb 15 21:00:30 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 15 Feb 2011 21:00:30 -0600 Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <4D5B3A1B.3040607@usgs.gov> References: <903282.93478.qm@web120515.mail.ne1.yahoo.com> <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov> <4D5B3A1B.3040607@usgs.gov> Message-ID: <7A4CE949-A06D-499B-8EE3-9269AFA4EFC6@mcs.anl.gov> The MATLAB libraries are listed in the $PETSC_ARCH/conf/petscvariables file in the variable PETSC_EXTERNAL_LIB_BASIC This was changed recently in PETSc-dev perhaps the pylith configure has not yet been updated to handle this. Barry On Feb 15, 2011, at 8:44 PM, Brad Aagaard wrote: > Birendra- > > PyLith extracts the link flags from PETSc during the PyLith configure, so any libraries PETSc uses should automatically get linked into the PyLith Python modules and libraries. Only if these libraries end up in some unusual PETSc make related variable would they be missed in the PyLith linking. You can also run "ldd libpylith.so" in the directory containing libpylith.so to make sure it is finding the shared libraries. > > Brad > > > On 2/15/11 5:56 PM, Barry Smith wrote: >> >> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: >> >>> Dear Petsc users, >>> >>> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. >> >> Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output >> also run file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 >> >>> >>> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test >>> Running test examples to verify correct installation >>> --------------Error detected during compile or link!----------------------- >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c >>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >>> /usr/bin/ld: cannot find -licudata >>> collect2: ld returned 1 exit status >>> make[3]: [ex19] Error 1 (ignored) >>> /bin/rm -f ex19.o >>> --------------Error detected during compile or link!----------------------- >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>> mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F >>> mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >>> /usr/bin/ld: cannot find -licudata >>> collect2: ld returned 1 exit status >>> make[3]: [ex5f] Error 1 (ignored) >>> /bin/rm -f ex5f.o >>> Completed test examples >>> >>> >>> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues >>> for some time now. So it should be that, I suppose. >>> >>> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test: >>> >>> PetscMatlabEngine e; >>> PetscScalar *array; array[0]=0; >>> const char name[]="a"; >>> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); >>> PetscMatlabEnginePutArray(e,1,1,array,name); >>> PetscMatlabEngineGetArray(e,1,1,array,name); >>> PetscMatlabEngineDestroy(e); >>> >>> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file? >> >> No >> >>> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine. >>> >>> But, I get runtime error for mexPrintf: >>> >>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg >>> Traceback (most recent call last): >>> File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in >>> from pylith.apps.PyLithApp import PyLithApp >>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in >>> from PetscApplication import PetscApplication >>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in >>> class PetscApplication(Application): >>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication >>> from pylith.utils.PetscManager import PetscManager >>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in >>> import pylith.utils.petsc as petsc >>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in >>> _petsc = swig_import_helper() >>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper >>> _mod = imp.load_module('_petsc', fp, pathname, description) >>> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf >> >> Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded. I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols? >> >> Barry >> >>> >>> >>> Can anyone help/suggest something? >>> >>> Thanks a lot >>> Bir >>> >>> >>> >> >> > From baagaard at usgs.gov Tue Feb 15 21:51:36 2011 From: baagaard at usgs.gov (Brad Aagaard) Date: Tue, 15 Feb 2011 19:51:36 -0800 Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <7A4CE949-A06D-499B-8EE3-9269AFA4EFC6@mcs.anl.gov> References: <903282.93478.qm@web120515.mail.ne1.yahoo.com> <73BFAA8A-C2AF-468D-A227-EE35BFFAC52F@mcs.anl.gov> <4D5B3A1B.3040607@usgs.gov> <7A4CE949-A06D-499B-8EE3-9269AFA4EFC6@mcs.anl.gov> Message-ID: <4D5B49C8.10102@usgs.gov> Barry- In my current petsc-dev $PETSC_ARCH/conf/petscvariables, I have PETSC_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} Shouldn't PETSC_LIB include all the libraries and paths we need for linking? Thanks, Brad On 2/15/11 7:00 PM, Barry Smith wrote: > > The MATLAB libraries are listed in the $PETSC_ARCH/conf/petscvariables file in the variable PETSC_EXTERNAL_LIB_BASIC This was changed recently in PETSc-dev perhaps the pylith configure has not yet been updated to handle this. > > Barry > > On Feb 15, 2011, at 8:44 PM, Brad Aagaard wrote: > >> Birendra- >> >> PyLith extracts the link flags from PETSc during the PyLith configure, so any libraries PETSc uses should automatically get linked into the PyLith Python modules and libraries. Only if these libraries end up in some unusual PETSc make related variable would they be missed in the PyLith linking. You can also run "ldd libpylith.so" in the directory containing libpylith.so to make sure it is finding the shared libraries. >> >> Brad >> >> >> On 2/15/11 5:56 PM, Barry Smith wrote: >>> >>> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: >>> >>>> Dear Petsc users, >>>> >>>> I am getting "cannot find -licudata" error during "make test" on petsc-dev, even when libicudata.so.42.1, and its link, libicudata.so.42 are in /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" was fine. >>> >>> Run ls -l /home/bjha/MATLAB/R2010b/bin/glnx86 and send the output >>> also run file /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 >>> >>>> >>>> bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test >>>> Running test examples to verify correct installation >>>> --------------Error detected during compile or link!----------------------- >>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>>> mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c >>>> mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >>>> /usr/bin/ld: cannot find -licudata >>>> collect2: ld returned 1 exit status >>>> make[3]: [ex19] Error 1 (ignored) >>>> /bin/rm -f ex19.o >>>> --------------Error detected during compile or link!----------------------- >>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>>> mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F >>>> mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl >>>> /usr/bin/ld: cannot find -licudata >>>> collect2: ld returned 1 exit status >>>> make[3]: [ex5f] Error 1 (ignored) >>>> /bin/rm -f ex5f.o >>>> Completed test examples >>>> >>>> >>>> It is correct that I didn't install matlab in the default directory (/usr/local) because I had some permission issues on Ubuntu. But I have been running matlab (by running /MATLAB/R2010b/bin/matlab.sh) without any issues >>>> for some time now. So it should be that, I suppose. >>>> >>>> I still went ahead with compiling my application with few lines of PetscMatlabEngine functions, just to test: >>>> >>>> PetscMatlabEngine e; >>>> PetscScalar *array; array[0]=0; >>>> const char name[]="a"; >>>> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); >>>> PetscMatlabEnginePutArray(e,1,1,array,name); >>>> PetscMatlabEngineGetArray(e,1,1,array,name); >>>> PetscMatlabEngineDestroy(e); >>>> >>>> Do I need to include any header file (e.g. petscmatlab.h) in the header of my class file? >>> >>> No >>> >>>> Right now, the application compiled (make, make install) fine without any such include file. The application have been using Petsc for its solver without any issues, so it includes all the necessary files. I just want to exten the application to call some matlab scripts by using PetscMatlabEngine. >>>> >>>> But, I get runtime error for mexPrintf: >>>> >>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ pylith step06_pres.cfg >>>> Traceback (most recent call last): >>>> File "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", line 37, in >>>> from pylith.apps.PyLithApp import PyLithApp >>>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", line 23, in >>>> from PetscApplication import PetscApplication >>>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 27, in >>>> class PetscApplication(Application): >>>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", line 41, in PetscApplication >>>> from pylith.utils.PetscManager import PetscManager >>>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", line 29, in >>>> import pylith.utils.petsc as petsc >>>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 25, in >>>> _petsc = swig_import_helper() >>>> File "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", line 21, in swig_import_helper >>>> _mod = imp.load_module('_petsc', fp, pathname, description) >>>> ImportError: /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: undefined symbol: mexPrintf >>> >>> Somehow all the Matlab shared libraries need to be found when python loads libpylith.so is loaded. I don't know how this is done in Linux. It is really a python question if you want to use a dynamic library in python that uses another shared library how do you make sure python gets all the shared libraries loaded to resolve the symbols? >>> >>> Barry >>> >>>> >>>> >>>> Can anyone help/suggest something? >>>> >>>> Thanks a lot >>>> Bir >>>> >>>> >>>> >>> >>> >> > > From bjha7333 at yahoo.com Wed Feb 16 01:35:00 2011 From: bjha7333 at yahoo.com (Birendra jha) Date: Tue, 15 Feb 2011 23:35:00 -0800 (PST) Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <426D3650-BA63-48A7-A8ED-09D9AB96D8E3@mcs.anl.gov> Message-ID: <856558.58586.qm@web120507.mail.ne1.yahoo.com> Hi, I removed -licudata at three places in petscvariables. The error just shifted to -licui18n: bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test Running test examples to verify correct installation --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl /usr/bin/ld: cannot find -licui18n collect2: ld returned 1 exit status make[3]: [ex19] Error 1 (ignored) /bin/rm -f ex19.o --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl /usr/bin/ld: cannot find -licui18n collect2: ld returned 1 exit status make[3]: [ex5f] Error 1 (ignored) /bin/rm -f ex5f.o Completed test examples I removed its references, error shifted to -licuuc, removed its references, error became: ... /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib/libpetsc.a(aij.o): In function `MatCreate_SeqAIJ': /home/bjha/src/petsc-dev/src/mat/impls/aij/seq/aij.c:3492: undefined reference to `MatGetFactor_seqaij_matlab' ... It seems a linking issue. I checked "file" for few other libraries--they are all ELF 32-bit: bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwblas.so libmwblas.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libjogl.so libjogl.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwfftw.so libmwfftw.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libeng.so libeng.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped PyLith configure also gives Petsc linking error: checking for PETSc dir... /home/bjha/src/petsc-dev checking for PETSc arch... linux_gcc-4.4.1_64 checking for PETSc config... /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/conf/petscvariables checking for PETSc version == 3.1.0... yes checking for PetscInitialize... no checking for the libraries used by mpicc... -pthread -L/home/bjha/tools/gcc-4.4.1_64/lib -lmpi -lopen-rte -lopen-pal -ldl -lnsl -lutil -lm -ldl checking for PetscInitialize... no configure: error: cannot link against PETSc libraries Please help. Thanks & regards Bir --- On Wed, 2/16/11, Barry Smith wrote: > From: Barry Smith > Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf > To: "PETSc users list" > Date: Wednesday, February 16, 2011, 7:55 AM > > On Feb 15, 2011, at 8:14 PM, Birendra jha wrote: > > > Hi, > > > > I attached the output of ls -l. Below are the outputs > of "file" command: > > > > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 > > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42: > symbolic link to `libicudata.so.42.1' > > > > bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1 > > > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF > 32-bit LSB shared object, Intel 80386, version 1 (SYSV), > dynamically linked, not stripped > > ???Humm. Run file on some of the other > libraries in that directory.? Are they also ELF 32 bit? > You can try editing ${PETSC_ARCH/conf/petscvariables and > removing the reference to libicudata then run make test. It > may not be needed. > > ? Barry > > > > > > > Thanks > > Bir > > > > --- On Wed, 2/16/11, Barry Smith > wrote: > > > >> From: Barry Smith > >> Subject: Re: [petsc-users] Petscmatlabengine, > libicudata not found, undefined symbol mexPrintf > >> To: "PETSc users list" > >> Date: Wednesday, February 16, 2011, 7:26 AM > >> > >> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: > >> > >>> Dear Petsc users, > >>> > >>> I am getting "cannot find -licudata" error > during > >> "make test" on petsc-dev, even when > libicudata.so.42.1, and > >> its link, libicudata.so.42 are in > >> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" > was fine. > >> > >>???Run ls -l > /home/bjha/MATLAB/R2010b/bin/glnx86 and > >> send the output > >> also run file > >> > /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 > >> > >>> > >>> bjha at ubuntu:~/src/petsc-dev$ make > >> PETSC_DIR=/home/bjha/src/petsc-dev > >> PETSC_ARCH=linux_gcc-4.4.1_64 test > >>> Running test examples to verify correct > installation > >>> --------------Error detected during compile > or > >> link!----------------------- > >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > >>> mpicxx -o ex19.o -c -Wall -Wwrite-strings > >> -Wno-strict-aliasing -Wno-unknown-pragmas -g > >> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 > >> -I/home/bjha/src/petsc-dev/include > >> > -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include > >> -I/home/bjha/src/petsc-dev/include/sieve > >> -I/home/bjha/MATLAB/R2010b/extern/include > >> -I/home/bjha/tools/gcc-4.4.1_64/include > >> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c > >>> mpicxx -Wall -Wwrite-strings > -Wno-strict-aliasing > >> -Wno-unknown-pragmas -g???-o ex19 > >> ex19.o > >> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib > > >> -lpetsc > >> > -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib > >> -lparmetis -lmetis > >> > -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 > >> -L/home/bjha/MATLAB/R2010b/bin/glnx86 > >> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng > -lmex > >> -lmx -lmat -lut -licudata -licui18n -licuuc -lml > -lchaco > >> -L/usr/lib/atlas -llapack_atlas -llapack -latlas > -lblas > >> -L/home/bjha/tools/gcc-4.4.1_64/lib > >> -L/usr/lib/gcc/i486-linux-gnu/4.4.3 > >> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte > -lopen-pal > >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 > -lmpi_f77 > >> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ > -lmpi_cxx > >> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl > -lutil > >> -lgcc_s -lpthread -ldl > >>> /usr/bin/ld: cannot find -licudata > >>> collect2: ld returned 1 exit status > >>> make[3]: [ex19] Error 1 (ignored) > >>> /bin/rm -f ex19.o > >>> --------------Error detected during compile > or > >> link!----------------------- > >>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > >>> mpif90 -c? -Wall -Wno-unused-variable > >> > -g???-I/home/bjha/src/petsc-dev/include > >> > -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include > >> -I/home/bjha/src/petsc-dev/include/sieve > >> -I/home/bjha/MATLAB/R2010b/extern/include > >> -I/home/bjha/tools/gcc-4.4.1_64/include? > ? -o > >> ex5f.o ex5f.F > >>> mpif90 -Wall -Wno-unused-variable > >> -g???-o ex5f ex5f.o > >> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib > > >> -lpetsc > >> > -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib > >> -lparmetis -lmetis > >> > -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 > >> -L/home/bjha/MATLAB/R2010b/bin/glnx86 > >> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng > -lmex > >> -lmx -lmat -lut -licudata -licui18n -licuuc -lml > -lchaco > >> -L/usr/lib/atlas -llapack_atlas -llapack -latlas > -lblas > >> -L/home/bjha/tools/gcc-4.4.1_64/lib > >> -L/usr/lib/gcc/i486-linux-gnu/4.4.3 > >> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte > -lopen-pal > >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 > -lmpi_f77 > >> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ > -lmpi_cxx > >> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl > -lutil > >> -lgcc_s -lpthread -ldl > >>> /usr/bin/ld: cannot find -licudata > >>> collect2: ld returned 1 exit status > >>> make[3]: [ex5f] Error 1 (ignored) > >>> /bin/rm -f ex5f.o > >>> Completed test examples > >>> > >>> > >>> It is correct that I didn't install matlab in > the > >> default directory (/usr/local) because I had some > permission > >> issues on Ubuntu. But I have been running matlab > (by running > >> /MATLAB/R2010b/bin/matlab.sh) without any issues > >>> for some time now. So it should be that, I > suppose. > >>> > >>> I still went ahead with compiling my > application with > >> few lines of PetscMatlabEngine functions, just to > test: > >>> > >>>???PetscMatlabEngine e; > >>>???PetscScalar *array; > array[0]=0; > >>>???const char name[]="a"; > >>>? > >> > PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); > >>>???PetscMatlabEnginePutArray(e,1,1,array,name); > >>>???PetscMatlabEngineGetArray(e,1,1,array,name); > >>>???PetscMatlabEngineDestroy(e); > >>> > >>> Do I need to include any header file (e.g. > >> petscmatlab.h) in the header of my class file? > >> > >>???No > >> > >>> Right now, the application compiled (make, > make > >> install) fine without any such include file. The > application > >> have been using Petsc for its solver without any > issues, so > >> it includes all the necessary files. I just want > to exten > >> the application to call some matlab scripts by > using > >> PetscMatlabEngine. > >>> > >>> But, I get runtime error for mexPrintf: > >>> > >>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ > pylith > >> step06_pres.cfg > >>> Traceback (most recent call last): > >>>???File > "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", > >> line 37, in > >>>? ???from > pylith.apps.PyLithApp import > >> PyLithApp > >>>???File > >> > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", > >> line 23, in > >>>? ???from PetscApplication > import > >> PetscApplication > >>>???File > >> > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", > >> line 27, in > >>>? ???class > PetscApplication(Application): > >>>???File > >> > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", > >> line 41, in PetscApplication > >>>? ???from > pylith.utils.PetscManager import > >> PetscManager > >>>???File > >> > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", > >> line 29, in > >>>? ???import > pylith.utils.petsc as petsc > >>>???File > >> > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", > >> line 25, in > >>>? ???_petsc = > swig_import_helper() > >>>???File > >> > "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", > >> line 21, in swig_import_helper > >>>? ???_mod = > imp.load_module('_petsc', fp, > >> pathname, description) > >>> ImportError: > >> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: > undefined > >> symbol: mexPrintf > >> > >>???Somehow all the Matlab shared > libraries need to be > >> found when python loads libpylith.so is > loaded.? I > >> don't know how this is done in Linux. It is really > a python > >> question if you want to use a dynamic library in > python that > >> uses another shared library how do you make sure > python gets > >> all the shared libraries loaded to resolve the > symbols? > >> > >>? ? Barry > >> > >>> > >>> > >>> Can anyone help/suggest something? > >>> > >>> Thanks a lot > >>> Bir > >>> > >>> > >>> > >> > >> > > > > > > > > From bsmith at mcs.anl.gov Wed Feb 16 07:57:42 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 16 Feb 2011 07:57:42 -0600 Subject: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf In-Reply-To: <856558.58586.qm@web120507.mail.ne1.yahoo.com> References: <856558.58586.qm@web120507.mail.ne1.yahoo.com> Message-ID: On Feb 16, 2011, at 1:35 AM, Birendra jha wrote: > Hi, > > I removed -licudata at three places in petscvariables. The error just shifted to -licui18n: > > bjha at ubuntu:~/src/petsc-dev$ make PETSC_DIR=/home/bjha/src/petsc-dev PETSC_ARCH=linux_gcc-4.4.1_64 test > Running test examples to verify correct installation > --------------Error detected during compile or link!----------------------- > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > mpicxx -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c > mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -o ex19 ex19.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl > /usr/bin/ld: cannot find -licui18n > collect2: ld returned 1 exit status > make[3]: [ex19] Error 1 (ignored) > /bin/rm -f ex19.o > --------------Error detected during compile or link!----------------------- > See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html > mpif90 -c -Wall -Wno-unused-variable -g -I/home/bjha/src/petsc-dev/include -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include -I/home/bjha/src/petsc-dev/include/sieve -I/home/bjha/MATLAB/R2010b/extern/include -I/home/bjha/tools/gcc-4.4.1_64/include -o ex5f.o ex5f.F > mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lpetsc -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib -lparmetis -lmetis -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -L/home/bjha/MATLAB/R2010b/bin/glnx86 -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng -lmex -lmx -lmat -lut -licui18n -licuuc -lml -lchaco -L/usr/lib/atlas -llapack_atlas -llapack -latlas -lblas -L/home/bjha/tools/gcc-4.4.1_64/lib -L/usr/lib/gcc/i486-linux-gnu/4.4.3 -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl > /usr/bin/ld: cannot find -licui18n > collect2: ld returned 1 exit status > make[3]: [ex5f] Error 1 (ignored) > /bin/rm -f ex5f.o > Completed test examples > > > I removed its references, error shifted to -licuuc, removed its references, error became: > ... > /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib/libpetsc.a(aij.o): In function `MatCreate_SeqAIJ': > /home/bjha/src/petsc-dev/src/mat/impls/aij/seq/aij.c:3492: undefined reference to `MatGetFactor_seqaij_matlab' Send configure.log and make.log to petsc-maint at mcs.anl.gov Looks like those two libraries are not needed, but something went wrong with the PETSc compile. Barry > ... > > It seems a linking issue. > > I checked "file" for few other libraries--they are all ELF 32-bit: > > bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwblas.so > libmwblas.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped > bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libjogl.so > libjogl.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped > bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libmwfftw.so > libmwfftw.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped > bjha at ubuntu:~/MATLAB/R2010b/bin/glnx86$ file libeng.so > libeng.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, stripped > > PyLith configure also gives Petsc linking error: > > checking for PETSc dir... /home/bjha/src/petsc-dev > checking for PETSc arch... linux_gcc-4.4.1_64 > checking for PETSc config... /home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/conf/petscvariables > checking for PETSc version == 3.1.0... yes > checking for PetscInitialize... no > checking for the libraries used by mpicc... -pthread -L/home/bjha/tools/gcc-4.4.1_64/lib -lmpi -lopen-rte -lopen-pal -ldl -lnsl -lutil -lm -ldl > checking for PetscInitialize... no > configure: error: cannot link against PETSc libraries > > Please help. > > Thanks & regards > Bir > > --- On Wed, 2/16/11, Barry Smith wrote: > >> From: Barry Smith >> Subject: Re: [petsc-users] Petscmatlabengine, libicudata not found, undefined symbol mexPrintf >> To: "PETSc users list" >> Date: Wednesday, February 16, 2011, 7:55 AM >> >> On Feb 15, 2011, at 8:14 PM, Birendra jha wrote: >> >>> Hi, >>> >>> I attached the output of ls -l. Below are the outputs >> of "file" command: >>> >>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file >> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 >>> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42: >> symbolic link to `libicudata.so.42.1' >>> >>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ file >> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1 >>> >> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42.1: ELF >> 32-bit LSB shared object, Intel 80386, version 1 (SYSV), >> dynamically linked, not stripped >> >> Humm. Run file on some of the other >> libraries in that directory. Are they also ELF 32 bit? >> You can try editing ${PETSC_ARCH/conf/petscvariables and >> removing the reference to libicudata then run make test. It >> may not be needed. >> >> Barry >> >>> >>> >>> Thanks >>> Bir >>> >>> --- On Wed, 2/16/11, Barry Smith >> wrote: >>> >>>> From: Barry Smith >>>> Subject: Re: [petsc-users] Petscmatlabengine, >> libicudata not found, undefined symbol mexPrintf >>>> To: "PETSc users list" >>>> Date: Wednesday, February 16, 2011, 7:26 AM >>>> >>>> On Feb 15, 2011, at 6:56 PM, Birendra jha wrote: >>>> >>>>> Dear Petsc users, >>>>> >>>>> I am getting "cannot find -licudata" error >> during >>>> "make test" on petsc-dev, even when >> libicudata.so.42.1, and >>>> its link, libicudata.so.42 are in >>>> /home/bjha/MATLAB/R2010b/bin/glnx86. Petsc "make" >> was fine. >>>> >>>> Run ls -l >> /home/bjha/MATLAB/R2010b/bin/glnx86 and >>>> send the output >>>> also run file >>>> >> /home/bjha/MATLAB/R2010b/bin/glnx86/libicudata.so.42 >>>> >>>>> >>>>> bjha at ubuntu:~/src/petsc-dev$ make >>>> PETSC_DIR=/home/bjha/src/petsc-dev >>>> PETSC_ARCH=linux_gcc-4.4.1_64 test >>>>> Running test examples to verify correct >> installation >>>>> --------------Error detected during compile >> or >>>> link!----------------------- >>>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>>>> mpicxx -o ex19.o -c -Wall -Wwrite-strings >>>> -Wno-strict-aliasing -Wno-unknown-pragmas -g >>>> -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 >>>> -I/home/bjha/src/petsc-dev/include >>>> >> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include >>>> -I/home/bjha/src/petsc-dev/include/sieve >>>> -I/home/bjha/MATLAB/R2010b/extern/include >>>> -I/home/bjha/tools/gcc-4.4.1_64/include >>>> -D__INSDIR__=src/snes/examples/tutorials/ ex19.c >>>>> mpicxx -Wall -Wwrite-strings >> -Wno-strict-aliasing >>>> -Wno-unknown-pragmas -g -o ex19 >>>> ex19.o >>>> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >> >>>> -lpetsc >>>> >> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >>>> -lparmetis -lmetis >>>> >> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 >>>> -L/home/bjha/MATLAB/R2010b/bin/glnx86 >>>> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng >> -lmex >>>> -lmx -lmat -lut -licudata -licui18n -licuuc -lml >> -lchaco >>>> -L/usr/lib/atlas -llapack_atlas -llapack -latlas >> -lblas >>>> -L/home/bjha/tools/gcc-4.4.1_64/lib >>>> -L/usr/lib/gcc/i486-linux-gnu/4.4.3 >>>> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte >> -lopen-pal >>>> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 >> -lmpi_f77 >>>> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ >> -lmpi_cxx >>>> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl >> -lutil >>>> -lgcc_s -lpthread -ldl >>>>> /usr/bin/ld: cannot find -licudata >>>>> collect2: ld returned 1 exit status >>>>> make[3]: [ex19] Error 1 (ignored) >>>>> /bin/rm -f ex19.o >>>>> --------------Error detected during compile >> or >>>> link!----------------------- >>>>> See http://www.mcs.anl.gov/petsc/petsc-2/documentation/faq.html >>>>> mpif90 -c -Wall -Wno-unused-variable >>>> >> -g -I/home/bjha/src/petsc-dev/include >>>> >> -I/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/include >>>> -I/home/bjha/src/petsc-dev/include/sieve >>>> -I/home/bjha/MATLAB/R2010b/extern/include >>>> -I/home/bjha/tools/gcc-4.4.1_64/include >> -o >>>> ex5f.o ex5f.F >>>>> mpif90 -Wall -Wno-unused-variable >>>> -g -o ex5f ex5f.o >>>> -L/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >> >>>> -lpetsc >>>> >> -Wl,-rpath,/home/bjha/src/petsc-dev/linux_gcc-4.4.1_64/lib >>>> -lparmetis -lmetis >>>> >> -Wl,-rpath,/home/bjha/MATLAB/R2010b/sys/os/glnx86:/home/bjha/MATLAB/R2010b/bin/glnx86:/home/bjha/MATLAB/R2010b/extern/lib/glnx86 >>>> -L/home/bjha/MATLAB/R2010b/bin/glnx86 >>>> -L/home/bjha/MATLAB/R2010b/extern/lib/glnx86 -leng >> -lmex >>>> -lmx -lmat -lut -licudata -licui18n -licuuc -lml >> -lchaco >>>> -L/usr/lib/atlas -llapack_atlas -llapack -latlas >> -lblas >>>> -L/home/bjha/tools/gcc-4.4.1_64/lib >>>> -L/usr/lib/gcc/i486-linux-gnu/4.4.3 >>>> -L/usr/lib/i486-linux-gnu -ldl -lmpi -lopen-rte >> -lopen-pal >>>> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 >> -lmpi_f77 >>>> -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ >> -lmpi_cxx >>>> -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl >> -lutil >>>> -lgcc_s -lpthread -ldl >>>>> /usr/bin/ld: cannot find -licudata >>>>> collect2: ld returned 1 exit status >>>>> make[3]: [ex5f] Error 1 (ignored) >>>>> /bin/rm -f ex5f.o >>>>> Completed test examples >>>>> >>>>> >>>>> It is correct that I didn't install matlab in >> the >>>> default directory (/usr/local) because I had some >> permission >>>> issues on Ubuntu. But I have been running matlab >> (by running >>>> /MATLAB/R2010b/bin/matlab.sh) without any issues >>>>> for some time now. So it should be that, I >> suppose. >>>>> >>>>> I still went ahead with compiling my >> application with >>>> few lines of PetscMatlabEngine functions, just to >> test: >>>>> >>>>> PetscMatlabEngine e; >>>>> PetscScalar *array; >> array[0]=0; >>>>> const char name[]="a"; >>>>> >>>> >> PetscMatlabEngineCreate(PETSC_COMM_WORLD,PETSC_NULL,&e); >>>>> PetscMatlabEnginePutArray(e,1,1,array,name); >>>>> PetscMatlabEngineGetArray(e,1,1,array,name); >>>>> PetscMatlabEngineDestroy(e); >>>>> >>>>> Do I need to include any header file (e.g. >>>> petscmatlab.h) in the header of my class file? >>>> >>>> No >>>> >>>>> Right now, the application compiled (make, >> make >>>> install) fine without any such include file. The >> application >>>> have been using Petsc for its solver without any >> issues, so >>>> it includes all the necessary files. I just want >> to exten >>>> the application to call some matlab scripts by >> using >>>> PetscMatlabEngine. >>>>> >>>>> But, I get runtime error for mexPrintf: >>>>> >>>>> bjha at ubuntu:~/src/pylith-dev/examples/3d/hex8$ >> pylith >>>> step06_pres.cfg >>>>> Traceback (most recent call last): >>>>> File >> "/home/bjha/tools/gcc-4.4.1_64/bin/pylith", >>>> line 37, in >>>>> from >> pylith.apps.PyLithApp import >>>> PyLithApp >>>>> File >>>> >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PyLithApp.py", >>>> line 23, in >>>>> from PetscApplication >> import >>>> PetscApplication >>>>> File >>>> >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", >>>> line 27, in >>>>> class >> PetscApplication(Application): >>>>> File >>>> >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/apps/PetscApplication.py", >>>> line 41, in PetscApplication >>>>> from >> pylith.utils.PetscManager import >>>> PetscManager >>>>> File >>>> >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/PetscManager.py", >>>> line 29, in >>>>> import >> pylith.utils.petsc as petsc >>>>> File >>>> >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", >>>> line 25, in >>>>> _petsc = >> swig_import_helper() >>>>> File >>>> >> "/home/bjha/tools/gcc-4.4.1_64/lib/python2.6/site-packages/pylith/utils/petsc.py", >>>> line 21, in swig_import_helper >>>>> _mod = >> imp.load_module('_petsc', fp, >>>> pathname, description) >>>>> ImportError: >>>> /home/bjha/tools/gcc-4.4.1_64/lib/libpylith.so.0: >> undefined >>>> symbol: mexPrintf >>>> >>>> Somehow all the Matlab shared >> libraries need to be >>>> found when python loads libpylith.so is >> loaded. I >>>> don't know how this is done in Linux. It is really >> a python >>>> question if you want to use a dynamic library in >> python that >>>> uses another shared library how do you make sure >> python gets >>>> all the shared libraries loaded to resolve the >> symbols? >>>> >>>> Barry >>>> >>>>> >>>>> >>>>> Can anyone help/suggest something? >>>>> >>>>> Thanks a lot >>>>> Bir >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > From juhaj at iki.fi Wed Feb 16 08:17:07 2011 From: juhaj at iki.fi (Juha =?iso-8859-1?q?J=E4ykk=E4?=) Date: Wed, 16 Feb 2011 14:17:07 +0000 Subject: [petsc-users] KSPBuildSolution Message-ID: <201102161417.09649.juhaj@iki.fi> Hi all! I have a problem with KSPBuildSolution. Either there is something wrong with KSPBuildSolution or my with understanding of it. Most likely the latter. I would think from the docs that KSPBuildSolution gives (in its last parameter) the current estimate of the solution and that at the last iteration before the whole thing converges, it would be THE solution. However, this is not true: what KSPBuildSolution gives me is close to zero everywhere (including my right boundary value which is supposed to be 1!) even though the eventual solution returned by SNESSolve() is correct. What exactly is KSPBuildSolution giving me or is it buggy? Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From jed at 59A2.org Wed Feb 16 08:23:42 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 15:23:42 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161417.09649.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 15:17, Juha J?ykk? wrote: > even > though the eventual solution returned by SNESSolve() is correct. > SNESSolve uses a Newton method so the linear system is being solving for a defect. If the initial guess is zero, then it would normally pick up your Dirichlet boundary conditions on the first iteration and all subsequent solves would have zero in those locations. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Wed Feb 16 09:18:43 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Wed, 16 Feb 2011 15:18:43 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> Message-ID: <201102161518.47281.juhaj@iki.fi> > SNESSolve uses a Newton method so the linear system is being solving for a So the "solution" in the KSP should actually be identically zero for a converged result? > defect. If the initial guess is zero, then it would normally pick up your > Dirichlet boundary conditions on the first iteration and all subsequent > solves would have zero in those locations. It is not zero initially. But, on the other hand, it has zeros at both ends even at the very first iteration. If I understood your reply correctly, this would be expected for an initial guess which has correct values at the boundaries and it should only pick them up on the first iteration if they were not correct to begin with. Having ruled out a possibility of a bug in KSP, I need to continue my hunt for DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make any difference, it always diverges. The funny thing is, it diverges even if I start with an *exact* *solution*... Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From matijakecman at gmail.com Wed Feb 16 05:54:10 2011 From: matijakecman at gmail.com (Matija Kecman) Date: Wed, 16 Feb 2011 11:54:10 +0000 Subject: [petsc-users] Additive Schwarz Method output variable with processor number Message-ID: Dear?Petsc users, I am new to Petsc and I have been compiling and running some of the examples. I have been investigating the Additive Schwarz Method example (ksp/ksp/examples/tutorials/ex8.c) using the 'Basic method' i.e. by setting the overlap and using the default PETSc decomposition. I was investigating the effect of using multiple processors using the following bash script (-n1, -n2 are the mesh dimensions in the x- and y-directions, -overlap specifies the overlap for the PCASMSetOverlap() routine): for proc in 1 2 3 4; do mpirun -np $proc ex8 -machinesfile machinesfile -n1 500 -n2 500 -overlap 2 -pc_asm_blocks 4 -ksp_monitor_true_residual -sub_ksp_type preonly -sub_pc_type lu > ./log_$proc.dat done After cleaning up the log files and plotting log ( ||Ae||/||Ax|| ) with iteration number I generated the attached figure. I am wondering why the number of iterations for convergence depends on the number of processors used??According to the FAQ: 'The convergence of many of the preconditioners in PETSc including the the default parallel preconditioner block Jacobi depends on the number of processes. The more processes the (slightly) slower convergence it has. This is the nature of iterative solvers, the more parallelism means the more "older" information is used in the solution process hence slower convergence.' but I seem to be observing the opposite effect. Many thanks, Matija -------------- next part -------------- A non-text attachment was scrubbed... Name: ASM.pdf Type: application/pdf Size: 7173 bytes Desc: not available URL: From jed at 59A2.org Wed Feb 16 09:25:39 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 16:25:39 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161518.47281.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 16:18, Juha J?ykk? wrote: > So the "solution" in the KSP should actually be identically zero for a > converged result? > Yes > > > defect. If the initial guess is zero, then it would normally pick up your > > Dirichlet boundary conditions on the first iteration and all subsequent > > solves would have zero in those locations. > > It is not zero initially. But, on the other hand, it has zeros at both ends > even at the very first iteration. If I understood your reply correctly, > this > would be expected for an initial guess which has correct values at the > boundaries and it should only pick them up on the first iteration if they > were > not correct to begin with. > > Having ruled out a possibility of a bug in KSP, I need to continue my hunt > for > DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make > any > difference, it always diverges. The funny thing is, it diverges even if I > start with an *exact* *solution*... > This is a problem. Run with -ksp_converged_reason to find out why it's diverging. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Wed Feb 16 09:33:29 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Wed, 16 Feb 2011 15:33:29 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> Message-ID: <201102161533.32506.juhaj@iki.fi> > > DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make > This is a problem. Run with -ksp_converged_reason to find out why it's > diverging. Sorry, I looked at a wrong output. For the exact solution, it is the line search, which fails. KSP converges with CONVERGED_RTOL, but SNES quits with DIVERGED_LS_FAILURE. -Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From knepley at gmail.com Wed Feb 16 09:44:25 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Feb 2011 09:44:25 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161518.47281.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 9:18 AM, Juha J?ykk? wrote: > > SNESSolve uses a Newton method so the linear system is being solving for > a > > So the "solution" in the KSP should actually be identically zero for a > converged result? > It is a correction, and the correction to the exact answer is zero. > > defect. If the initial guess is zero, then it would normally pick up your > > Dirichlet boundary conditions on the first iteration and all subsequent > > solves would have zero in those locations. > > It is not zero initially. But, on the other hand, it has zeros at both ends > even at the very first iteration. If I understood your reply correctly, > this > would be expected for an initial guess which has correct values at the > boundaries and it should only pick them up on the first iteration if they > were > not correct to begin with. > Yes. > Having ruled out a possibility of a bug in KSP, I need to continue my hunt > for > DIVERGED_LINEAR_SOLVE... None of the convergence tolerances seem to make > any > difference, it always diverges. The funny thing is, it diverges even if I > start with an *exact* *solution*... > It is a good idea to use -ksp_type preonly -pc_type lu to start until you understand the problem. Matt > Cheers, > Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Feb 16 09:39:32 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 16:39:32 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161533.32506.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> <201102161533.32506.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 16:33, Juha J?ykk? wrote: > Sorry, I looked at a wrong output. For the exact solution, it is the line > search, which fails. KSP converges with CONVERGED_RTOL, but SNES quits with > DIVERGED_LS_FAILURE. > Your Jacobian may be incorrect, try running with -snes_mf_operator -pc_type lu -ksp_monitor. The linear solves should converge in 1 iteration. You can also try -mat_mffd_type ds if the residual is ill-conditioned. If you can make the problem small, use -snes_type test to check the correctness of the Jacobian directly and -snes_test_display to show how the entries. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Wed Feb 16 10:04:05 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Wed, 16 Feb 2011 16:04:05 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> Message-ID: <201102161604.08073.juhaj@iki.fi> Since I got two emails before I could reply to one, I am replying to both simultaneously. > It is a good idea to use -ksp_type preonly -pc_type lu to start until you > understand the problem. Matthew: Thanks, but I still get diverging line searches. > Your Jacobian may be incorrect, try running with -snes_mf_operator -pc_type > lu -ksp_monitor. The linear solves should converge in 1 iteration. You can > also try -mat_mffd_type ds if the residual is ill-conditioned. Jed: I am running with a FD Jacobian just to make sure my hand-written one is not the culprit. Is there any reason to suspect this might be the reason? -Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From knepley at gmail.com Wed Feb 16 10:05:46 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Feb 2011 10:05:46 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161604.08073.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 10:04 AM, Juha J?ykk? wrote: > Since I got two emails before I could reply to one, I am replying to both > simultaneously. > > > It is a good idea to use -ksp_type preonly -pc_type lu to start until you > > understand the problem. > > Matthew: Thanks, but I still get diverging line searches. > > > Your Jacobian may be incorrect, try running with -snes_mf_operator > -pc_type > > lu -ksp_monitor. The linear solves should converge in 1 iteration. You > can > > also try -mat_mffd_type ds if the residual is ill-conditioned. > > Jed: I am running with a FD Jacobian just to make sure my hand-written one > is > not the culprit. Is there any reason to suspect this might be the reason? > Yes, line search failure often occurs for incorrect Jacobians because the solution to the Newton system is not a descent direction, which is checked by the line search. Matt > -Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Feb 16 10:06:15 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 17:06:15 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161604.08073.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161518.47281.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 17:04, Juha J?ykk? wrote: > Is there any reason to suspect this might be the reason? Yes, it is the most common place to make programming mistakes and the symptoms you describe are typical. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Feb 16 10:28:09 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 17:28:09 +0100 Subject: [petsc-users] Additive Schwarz Method output variable with processor number In-Reply-To: References: Message-ID: On Wed, Feb 16, 2011 at 12:54, Matija Kecman wrote: > After cleaning up the log files and plotting log ( ||Ae||/||Ax|| ) > with iteration number I generated the attached figure. I am wondering > why the number of iterations for convergence depends on the number of > processors used? According to the FAQ: > > 'The convergence of many of the preconditioners in PETSc including the > the default parallel preconditioner block Jacobi depends on the number > of processes. The more processes the (slightly) slower convergence it > has. This is the nature of iterative solvers, the more parallelism > means the more "older" information is used in the solution process > hence slower convergence.' > > but I seem to be observing the opposite effect. > You are using the same number of subdomains, but they are shaped differently. It seems likely that you have Parmetis installed in which case PCASM uses it to partition multiple subdomains on each process. In this case, those domains are not as good as the rectangular partition that you get by using more processes. Compare: $ mpiexec -n 1 ./ex8 -m 200 -n 200 -sub_pc_type lu -ksp_converged_reason -pc_type asm -pc_asm_blocks 4 -mat_partitioning_type parmetis Linear solve converged due to CONVERGED_RTOL iterations 27 $ mpiexec -n 1 ./ex8 -m 200 -n 200 -sub_pc_type lu -ksp_converged_reason -pc_type asm -pc_asm_blocks 4 -mat_partitioning_type square Linear solve converged due to CONVERGED_RTOL iterations 22 $ mpiexec -n 4 ./ex8 -m 200 -n 200 -sub_pc_type lu -ksp_converged_reason -pc_type asm -pc_asm_blocks 4 Linear solve converged due to CONVERGED_RTOL iterations 22 -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Wed Feb 16 10:31:47 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Wed, 16 Feb 2011 16:31:47 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi> Message-ID: <201102161631.50314.juhaj@iki.fi> > Yes, it is the most common place to make programming mistakes and the > symptoms you describe are typical. Please let me double-check there has not been a misunderstanding here: the problems I describe occur with the PETSc built-in FD Jacobian approximation, not my own. Now, I realise this will be a less-than-optimal approximation, but I fail to see how there could be a programming mistake, when I am using SNESDefaultComputeJacobianColor and not my hand-written Jacobian. I do get the same symptoms with the hand-written one, too. That's why I wanted to check with the PETSc built in FD version. Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From jed at 59A2.org Wed Feb 16 10:37:06 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 17:37:06 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161631.50314.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi> <201102161631.50314.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 17:31, Juha J?ykk? wrote: > Please let me double-check there has not been a misunderstanding here: the > problems I describe occur with the PETSc built-in FD Jacobian > approximation, > not my own. Now, I realise this will be a less-than-optimal approximation, > but > I fail to see how there could be a programming mistake, when I am using > SNESDefaultComputeJacobianColor and not my hand-written Jacobian. > > I do get the same symptoms with the hand-written one, too. That's why I > wanted > to check with the PETSc built in FD version. > If your system is poorly scaled or genuinely ill-conditioned, the FD Jacobian could be bad. Sometimes it helps to use a more robust method of determining the differencing parameter: -mat_fd_type ds (when using coloring) or -mat_mffd_type ds (when using -snes_mf_operator). You can also try solving the linear system to higher tolerance and looking at the true residual to be sure the linear system really is solved accurately. What sort of problem are you solving? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Feb 16 10:40:50 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 16 Feb 2011 10:40:50 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161631.50314.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi> <201102161631.50314.juhaj@iki.fi> Message-ID: <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov> Try using SNESDefaultComputeJacobian() see if that makes any difference. 99.9% of the causes of non-convergencing Newton are wrong or slightly wrong Jacobians. Very unlikely possibilities are 1) it is converging to a local minimum that is not a solution. This is checked by PETSc automatically if the line search failed so is unlikely to be the problem. But run with -info and it will print a great deal of information about the nonlinear solver including a message about " near zero implies" cut and paste all the message about the "near zero" and send it to us. 2) the function is not smooth so Newton's taylor series approximation simply doesn't work. Barry On Feb 16, 2011, at 10:31 AM, Juha J?ykk? wrote: >> Yes, it is the most common place to make programming mistakes and the >> symptoms you describe are typical. > > Please let me double-check there has not been a misunderstanding here: the > problems I describe occur with the PETSc built-in FD Jacobian approximation, > not my own. Now, I realise this will be a less-than-optimal approximation, but > I fail to see how there could be a programming mistake, when I am using > SNESDefaultComputeJacobianColor and not my hand-written Jacobian. > > I do get the same symptoms with the hand-written one, too. That's why I wanted > to check with the PETSc built in FD version. > > Cheers, > Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- From knepley at gmail.com Wed Feb 16 11:42:14 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Feb 2011 11:42:14 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov> References: <201102161417.09649.juhaj@iki.fi> <201102161604.08073.juhaj@iki.fi> <201102161631.50314.juhaj@iki.fi> <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov> Message-ID: On Wed, Feb 16, 2011 at 10:40 AM, Barry Smith wrote: > > Try using SNESDefaultComputeJacobian() see if that makes any difference. > > 99.9% of the causes of non-convergencing Newton are wrong or slightly > wrong Jacobians. Very unlikely possibilities are > > 1) it is converging to a local minimum that is not a solution. This is > checked by PETSc automatically if the line search failed so is unlikely to > be the problem. But run with -info and it will print a great deal of > information about the nonlinear solver including a message about " near > zero implies" cut and paste all the message about the "near zero" and send > it to us. > > 2) the function is not smooth so Newton's taylor series approximation > simply doesn't work. Unlikely possibility #3: You have written an equation with no real solutions, meaning there is a mistake in your function. Matt > > Barry > > On Feb 16, 2011, at 10:31 AM, Juha J?ykk? wrote: > > >> Yes, it is the most common place to make programming mistakes and the > >> symptoms you describe are typical. > > > > Please let me double-check there has not been a misunderstanding here: > the > > problems I describe occur with the PETSc built-in FD Jacobian > approximation, > > not my own. Now, I realise this will be a less-than-optimal > approximation, but > > I fail to see how there could be a programming mistake, when I am using > > SNESDefaultComputeJacobianColor and not my hand-written Jacobian. > > > > I do get the same symptoms with the hand-written one, too. That's why I > wanted > > to check with the PETSc built in FD version. > > > > Cheers, > > Juha > > > > -- > > ----------------------------------------------- > > | Juha J?ykk?, juhaj at iki.fi | > > | http://www.maths.leeds.ac.uk/~juhaj | > > ----------------------------------------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Wed Feb 16 13:39:42 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Wed, 16 Feb 2011 19:39:42 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov> Message-ID: <201102161939.45779.juhaj@iki.fi> > > 1) it is converging to a local minimum that is not a solution. This is > > checked by PETSc automatically if the line search failed so is unlikely > > to be the problem. But run with -info and it will print a great deal of > > information about the nonlinear solver including a message about " near > > zero implies" cut and paste all the message about the "near zero" and > > send it to us. There is just one and it is the last message before the usual -snes_monitor output: [0] SNESLSCheckLocalMin_Private(): (F^T J random)/(|| F ||*||random|| 20.4682 near zero implies found a local minimum This is with SNES...JacobianColor(). With my own Jacobian, there are none. > > 2) the function is not smooth so Newton's taylor series approximation > > simply doesn't work. Which function, F(u) or my u*, which satisfies F(u*)=0? I.e. the unknown or the function evaluated by FormFunction? I find it unlikely that the solution would not be at least twice differentiable (apart from the endpoints, where it is not): it is almost guaranteed to be since the equation is the Euler-Lagrange equation of a well behaved action integral. As for F(u), the function is a polynomial of u and x (x being the coordinate), so it is smooth, if u is. > Unlikely possibility #3: > > You have written an equation with no real solutions, meaning there is a > mistake in your function. But my initial guess is an exact solution. I have two free parameters in the equation and for a single choice I can find an exact solution - it happens to be u(x) = x, so the discrete derivatives are exactly the same as the continuous ones (apart from floating point rounding errors, of course). Now, it is quite possible that my problem is poorly scaled or ill-conditioned, like Jed Brown suggested. Can I check the eigenvalues of the KSP matrices somehow? -Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From knepley at gmail.com Wed Feb 16 13:44:13 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Feb 2011 13:44:13 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161939.45779.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov> <201102161939.45779.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 1:39 PM, Juha J?ykk? wrote: > > > 1) it is converging to a local minimum that is not a solution. This is > > > checked by PETSc automatically if the line search failed so is unlikely > > > to be the problem. But run with -info and it will print a great deal of > > > information about the nonlinear solver including a message about " > near > > > zero implies" cut and paste all the message about the "near zero" and > > > send it to us. > > There is just one and it is the last message before the usual -snes_monitor > output: > > [0] SNESLSCheckLocalMin_Private(): (F^T J random)/(|| F ||*||random|| > 20.4682 > near zero implies found a local minimum > > This is with SNES...JacobianColor(). With my own Jacobian, there are none. > > > > 2) the function is not smooth so Newton's taylor series approximation > > > simply doesn't work. > > Which function, F(u) or my u*, which satisfies F(u*)=0? I.e. the unknown or > the function evaluated by FormFunction? > > I find it unlikely that the solution would not be at least twice > differentiable (apart from the endpoints, where it is not): it is almost > guaranteed to be since the equation is the Euler-Lagrange equation of a > well > behaved action integral. > > As for F(u), the function is a polynomial of u and x (x being the > coordinate), > so it is smooth, if u is. > > > Unlikely possibility #3: > > > > You have written an equation with no real solutions, meaning there is a > > mistake in your function. > > But my initial guess is an exact solution. I have two free parameters in > the > equation and for a single choice I can find an exact solution - it happens > to > be u(x) = x, so the discrete derivatives are exactly the same as the > continuous ones (apart from floating point rounding errors, of course). > Wait, if your initial guess is an exact solution, there should be no KSP solve. Matt > Now, it is quite possible that my problem is poorly scaled or > ill-conditioned, > like Jed Brown suggested. Can I check the eigenvalues of the KSP matrices > somehow? > > -Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Feb 16 13:49:33 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 16 Feb 2011 20:49:33 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102161939.45779.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <17D1BEBC-4385-425F-8DD3-0646A0CB5863@mcs.anl.gov> <201102161939.45779.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 20:39, Juha J?ykk? wrote: > Which function, F(u) or my u*, which satisfies F(u*)=0? I.e. the unknown or > the function evaluated by FormFunction? > > I find it unlikely that the solution would not be at least twice > differentiable (apart from the endpoints, where it is not): it is almost > guaranteed to be since the equation is the Euler-Lagrange equation of a > well > behaved action integral. > > As for F(u), the function is a polynomial of u and x (x being the > coordinate), > so it is smooth, if u is. > We just want F(u) to be continuously differentiable as a function from R^n to R^n (were n is the size of u). > > > Unlikely possibility #3: > > > > You have written an equation with no real solutions, meaning there is a > > mistake in your function. > > But my initial guess is an exact solution. > So the initial SNES residual is nearly zero? Then the differencing and the solve could be dominated by rounding error. > I have two free parameters in the > equation and for a single choice I can find an exact solution - it happens > to > be u(x) = x, so the discrete derivatives are exactly the same as the > continuous ones (apart from floating point rounding errors, of course). > > Now, it is quite possible that my problem is poorly scaled or > ill-conditioned, > like Jed Brown suggested. Can I check the eigenvalues of the KSP matrices > somehow? > There are a few ways to do it automatically as part of the solve. These shows you spectral information for the preconditioned operator so run with -pc_type none to see the true spectrum. You typically need to use GMRES for these to work: -ksp_monitor_singular_value : Monitor singular values (KSPMonitorSet) -ksp_compute_singularvalues: Compute singular values of preconditioned operator (KSPSetComputeSingularValues) -ksp_compute_eigenvalues: Compute eigenvalues of preconditioned operator (KSPSetComputeSingularValues) -ksp_plot_eigenvalues: Scatter plot extreme eigenvalues (KSPSetComputeSingularValues) -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Wed Feb 16 18:11:45 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Thu, 17 Feb 2011 00:11:45 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <201102161939.45779.juhaj@iki.fi> Message-ID: <201102170011.54502.juhaj@iki.fi> Hi again. I do not know which one to answer to, but thanks to all. I got one step further with your help: [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12! Adding -pc_factor_shift_type POSITIVE_DEFINITE helps and now I get at least some progress. It does not matter if I use -snes_fd, coloring or my hand- written Jacobian. If I start from a non-solution initial guess, they all progress somewhat towards what I believe is more or less a real solution. This is all ran with -pc_type lu -ksp_type preonly, btw. However, they do not seem to change the second to the last (counting from left to right) value on the lattice (this is 1D). It does change, but only by 0.02% while values elsewhere in the lattice change much more significantly. Again, I am running with the parameters where I know the exact solution and my initial guess is, on purpose, 0.01 off the correct one (except at the endpoints). What could be causing this? One thing strikes me as odd. I tried checking the value of the function at the first iteration and at that particular point it is of the order of -1.e-9, whereas in the next point to the left, it is +1.e-9 and then starts increasing towards the left before it starts decreasing again around half-way through the lattice. Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From knepley at gmail.com Wed Feb 16 18:30:45 2011 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Feb 2011 18:30:45 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102170011.54502.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102161939.45779.juhaj@iki.fi> <201102170011.54502.juhaj@iki.fi> Message-ID: On Wed, Feb 16, 2011 at 6:11 PM, Juha J?ykk? wrote: > Hi again. > > I do not know which one to answer to, but thanks to all. I got one step > further with your help: > > [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12! > You have a singular Jacobian, which leads me to believe that your boundary conditions are incorrect. Matt > Adding -pc_factor_shift_type POSITIVE_DEFINITE helps and now I get at least > some progress. It does not matter if I use -snes_fd, coloring or my hand- > written Jacobian. If I start from a non-solution initial guess, they all > progress somewhat towards what I believe is more or less a real solution. > This > is all ran with -pc_type lu -ksp_type preonly, btw. > > However, they do not seem to change the second to the last (counting from > left > to right) value on the lattice (this is 1D). It does change, but only by > 0.02% > while values elsewhere in the lattice change much more significantly. > Again, I > am running with the parameters where I know the exact solution and my > initial > guess is, on purpose, 0.01 off the correct one (except at the endpoints). > What > could be causing this? > > One thing strikes me as odd. I tried checking the value of the function at > the > first iteration and at that particular point it is of the order of -1.e-9, > whereas in the next point to the left, it is +1.e-9 and then starts > increasing > towards the left before it starts decreasing again around half-way through > the > lattice. > > Cheers, > Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrabbani at gmail.com Wed Feb 16 21:18:21 2011 From: mgrabbani at gmail.com (Golam Rabbani) Date: Wed, 16 Feb 2011 19:18:21 -0800 Subject: [petsc-users] Removing petsc Message-ID: Hi, I am new to petsc and also new to linux/unix environment. Recently I started converting a matlab nanodevice simulaiton code into a c code using petsc and have done most of the converting. But in the process I have installed petsc 4/5 times with different configuration options (real, complex, with blopex, etc). Now that I do not need some of them, I want to uninstall them. Can I simply delete the unwanted versions (all the versions are in separate folders)? Please advise me if something different is needed. Thanks in advance, Golam -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Feb 16 21:20:35 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 16 Feb 2011 21:20:35 -0600 Subject: [petsc-users] Removing petsc In-Reply-To: References: Message-ID: If you just used different PETSC_ARCH variables like arch-complex arch-opt etc you can just delete those directories and that will remove everything related to them. Barry On Feb 16, 2011, at 9:18 PM, Golam Rabbani wrote: > Hi, > > I am new to petsc and also new to linux/unix environment. > Recently I started converting a matlab nanodevice simulaiton code into a c code using petsc and have done most of the converting. > But in the process I have installed petsc 4/5 times with different configuration options (real, complex, with blopex, etc). Now that I > do not need some of them, I want to uninstall them. Can I simply delete the unwanted versions (all the versions are in separate folders)? > Please advise me if something different is needed. > > Thanks in advance, > Golam From elhombrefr at hotmail.fr Thu Feb 17 08:01:14 2011 From: elhombrefr at hotmail.fr (El Hombre Frances) Date: Thu, 17 Feb 2011 15:01:14 +0100 Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran Message-ID: <4D5D2A2A.2070707@hotmail.fr> Hi, I'm looking for how to get information about the DA of a DMMG finest grid. I saw that you can access them with call DMMGGetDA(... call DAGetInfo(... but it doesn't work in the main program, i get this error 0]PETSC ERROR: Invalid argument! [0]PETSC ERROR: Wrong type of object: Parameter # 1! I tried with this example http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html I don't know how to get DA infos outside the subroutines ComputeRHS and ComputeJacobian I want grid size of the DA 3D in order to plot the field. I noticed that it's possible with PetscObjectQuery in C language. Thanks for your help Pierre Navaro From juhaj at iki.fi Thu Feb 17 08:08:37 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Thu, 17 Feb 2011 14:08:37 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <201102170011.54502.juhaj@iki.fi> Message-ID: <201102171408.40609.juhaj@iki.fi> > > [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12! > You have a singular Jacobian, which leads me to believe that your boundary > conditions are incorrect. Thanks for the tip. I also found the following in the -info output: [0] SNESLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 27.9594 near zero implies inconsistent rhs which is strange at first sight: my RHS is my equation, how can that be inconsistent? It is what defines the problem. BUT your comment about the boundary conditions led me to look into them in more detail. It seems one of them may be trivially satisfied, leaving me with an infinite number of solutions satisfying the other one too. Could this be the reason for my problems? If so, I need to start thinking of another boundary condition... not nice, since the problem I am solving does not really give them! =( Take an example: r f''/f - r (f'/f)^2 + f'/f = 0. This has the general solution f(r) = a r^b, but if the boundary conditions are f(0)=0 and f(1)=1 (like I had in my real problem), we have (if b>0): a*0 = 0 a*1 = 1 and b is left undetermined or (if b<0): lim(a*r^b) = 0 as r->0 => a=0 0*1^b = 1 => no solution. I am unsure how to interpret b=0 since then we have 0^0 at the boundary. But even disregarding the undefined value at the boundary, there cannot be a continuous solution with b=0 since then f(r>0)=1, but f(0)=0. Looks like Heaviside step to me. In summary, I cannot even solve this simpler problem with SNES, but I think I understand the reason now: I have not specified boundary conditions which would give a unique solution - they probably do not even give a finite number of solutions, but an inifinite (quite likely uncountable, like in my example) number of solutions. Any thoughts on this? Is there any sense in my analysis? If so, I will need to go think about Neumann boundaries... Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From knepley at gmail.com Thu Feb 17 08:15:01 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 08:15:01 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102171408.40609.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102170011.54502.juhaj@iki.fi> <201102171408.40609.juhaj@iki.fi> Message-ID: On Thu, Feb 17, 2011 at 8:08 AM, Juha J?ykk? wrote: > > > [0]PETSC ERROR: Zero pivot row 1 value 1.90611e-13 tolerance 1e-12! > > You have a singular Jacobian, which leads me to believe that your > boundary > > conditions are incorrect. > > Thanks for the tip. I also found the following in the -info output: > > [0] SNESLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 27.9594 near zero > implies inconsistent rhs > > which is strange at first sight: my RHS is my equation, how can that be > inconsistent? It is what defines the problem. BUT your comment about the > boundary conditions led me to look into them in more detail. > > It seems one of them may be trivially satisfied, leaving me with an > infinite > number of solutions satisfying the other one too. Could this be the reason > for > my problems? If so, I need to start thinking of another boundary > condition... > not nice, since the problem I am solving does not really give them! =( > > Take an example: > > r f''/f - r (f'/f)^2 + f'/f = 0. > > This has the general solution > > f(r) = a r^b, > > but if the boundary conditions are f(0)=0 and f(1)=1 (like I had in my real > problem), we have (if b>0): > > a*0 = 0 > a*1 = 1 > > and b is left undetermined or (if b<0): > > lim(a*r^b) = 0 as r->0 => a=0 > 0*1^b = 1 => no solution. > > I am unsure how to interpret b=0 since then we have 0^0 at the boundary. > But > even disregarding the undefined value at the boundary, there cannot be a > continuous solution with b=0 since then f(r>0)=1, but f(0)=0. Looks like > Heaviside step to me. > > In summary, I cannot even solve this simpler problem with SNES, but I think > I > understand the reason now: I have not specified boundary conditions which > would give a unique solution - they probably do not even give a finite > number > of solutions, but an inifinite (quite likely uncountable, like in my > example) > number of solutions. > > Any thoughts on this? Is there any sense in my analysis? If so, I will need > to > go think about Neumann boundaries... Yes, if your BC do not give at least a locally unique solution, then your Jacobian will be rank deficient and Newton breaks down. You can still try Picard, but I recommend understanding what you mean by a solution first. Matt > Cheers, > Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 17 08:19:46 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 08:19:46 -0600 Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran In-Reply-To: <4D5D2A2A.2070707@hotmail.fr> References: <4D5D2A2A.2070707@hotmail.fr> Message-ID: On Thu, Feb 17, 2011 at 8:01 AM, El Hombre Frances wrote: > Hi, > I'm looking for how to get information about the DA of a DMMG finest grid. > I saw that you can access them with > call DMMGGetDA(... > call DAGetInfo(... > but it doesn't work in the main program, i get this error > 0]PETSC ERROR: Invalid argument! > [0]PETSC ERROR: Wrong type of object: Parameter # 1! > 1) Always send the COMPLETE error message 2) That should work. Notice that we use that on line 75 Matt > I tried with this example > > http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html > > I don't know how to get DA infos outside the subroutines ComputeRHS and > ComputeJacobian > > I want grid size of the DA 3D in order to plot the field. I noticed that > it's possible with PetscObjectQuery in C language. > > Thanks for your help > > Pierre Navaro > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From elhombrefr at hotmail.fr Thu Feb 17 09:00:44 2011 From: elhombrefr at hotmail.fr (Pierre Navaro) Date: Thu, 17 Feb 2011 16:00:44 +0100 Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran In-Reply-To: References: <4D5D2A2A.2070707@hotmail.fr> Message-ID: <4D5D381C.7010207@hotmail.fr> Hi I add these lines on line 54 call DMMGGetDA(dmmg,db,ierr) call DAGetInfo(db,PETSC_NULL_INTEGER,mx,my,mz, & & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, & & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, & & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, & & PETSC_NULL_INTEGER,ierr) print"('mx,my,mz =',3i4,X,i6)",mx,my,mz, mx*my*mz and i get these errors [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Invalid argument! [0]PETSC ERROR: Wrong type of object: Parameter # 1! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 CST 2010 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./ex22 on a arch-osx- named m-navaro.u-strasbg.fr by navaro Thu Feb 17 15:59:43 2011 [0]PETSC ERROR: Libraries linked from /opt/petsc-3.1-p7/arch-osx-10.6/lib [0]PETSC ERROR: Configure run at Tue Jan 25 09:41:36 2011 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc="gfortran -m64" --with-cxx=g++ --download-mpich=1 PETSC_ARCH=arch-osx-10.6 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: DAGetInfo() line 309 in src/dm/da/src/daview.c mx,my,mz = 0 0 0 0 Best regards Pierre On 17/02/11 15:19, Matthew Knepley wrote: > On Thu, Feb 17, 2011 at 8:01 AM, El Hombre Frances > > wrote: > > Hi, > I'm looking for how to get information about the DA of a DMMG > finest grid. > I saw that you can access them with > call DMMGGetDA(... > call DAGetInfo(... > but it doesn't work in the main program, i get this error > 0]PETSC ERROR: Invalid argument! > [0]PETSC ERROR: Wrong type of object: Parameter # 1! > > > 1) Always send the COMPLETE error message > > 2) That should work. Notice that we use that on line 75 > > Matt > > I tried with this example > http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html > > I don't know how to get DA infos outside the subroutines > ComputeRHS and ComputeJacobian > > I want grid size of the DA 3D in order to plot the field. I > noticed that it's possible with PetscObjectQuery in C language. > > Thanks for your help > > Pierre Navaro > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- Pierre NAVARO IRMA - UMR 7501 CNRS/Universite de Strasbourg - Bureau i101 7 rue Rene Descartes F-67084 Strasbourg Cedex, FRANCE. tel : (33) [0]3 68 85 01 73, fax : (33) [0]3 68 85 01 05 http://www-irma.u-strasbg.fr/~navaro -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 17 09:29:44 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 09:29:44 -0600 Subject: [petsc-users] Get grid sizes from DMMG finest grid in Fortran In-Reply-To: <4D5D381C.7010207@hotmail.fr> References: <4D5D2A2A.2070707@hotmail.fr> <4D5D381C.7010207@hotmail.fr> Message-ID: On Thu, Feb 17, 2011 at 9:00 AM, Pierre Navaro wrote: > Hi > I add these lines on line 54 > call DMMGGetDA(dmmg,db,ierr) > call DAGetInfo(db,PETSC_NULL_INTEGER,mx,my,mz, > & > & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, > & > & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, > & > & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, > & > & PETSC_NULL_INTEGER,ierr) > print"('mx,my,mz =',3i4,X,i6)",mx,my,mz, mx*my*mz > Fortran is always so easy: DMMG dmmg, dmmgLevel DM da, db PetscInt mx,my,mz call DMMGArrayGetDMMG(dmmg,dmmgLevel,ierr) call DMMGGetDM(dmmgLevel,db,ierr) call DMDAGetInfo(db,PETSC_NULL_INTEGER,mx,my,mz, & & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, & & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, & & PETSC_NULL_INTEGER,PETSC_NULL_INTEGER, & & PETSC_NULL_INTEGER,ierr) print"('mx,my,mz =',3i4,X,i6)",mx,my,mz, mx*my*mz Matt > and i get these errors > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Invalid argument! > > [0]PETSC ERROR: Wrong type of object: Parameter # 1! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 > CST 2010 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./ex22 on a arch-osx- named m-navaro.u-strasbg.fr by > navaro Thu Feb 17 15:59:43 2011 > [0]PETSC ERROR: Libraries linked from /opt/petsc-3.1-p7/arch-osx-10.6/lib > [0]PETSC ERROR: Configure run at Tue Jan 25 09:41:36 2011 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc="gfortran -m64" > --with-cxx=g++ --download-mpich=1 PETSC_ARCH=arch-osx-10.6 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: DAGetInfo() line 309 in src/dm/da/src/daview.c > mx,my,mz = 0 0 0 0 > > Best regards > Pierre > > > On 17/02/11 15:19, Matthew Knepley wrote: > > On Thu, Feb 17, 2011 at 8:01 AM, El Hombre Frances wrote: > >> Hi, >> I'm looking for how to get information about the DA of a DMMG finest grid. >> I saw that you can access them with >> call DMMGGetDA(... >> call DAGetInfo(... >> but it doesn't work in the main program, i get this error >> 0]PETSC ERROR: Invalid argument! >> [0]PETSC ERROR: Wrong type of object: Parameter # 1! >> > > 1) Always send the COMPLETE error message > > 2) That should work. Notice that we use that on line 75 > > Matt > > >> I tried with this example >> >> http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F.html >> >> I don't know how to get DA infos outside the subroutines ComputeRHS and >> ComputeJacobian >> >> I want grid size of the DA 3D in order to plot the field. I noticed that >> it's possible with PetscObjectQuery in C language. >> >> Thanks for your help >> >> Pierre Navaro >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > -- > Pierre NAVARO > IRMA - UMR 7501 CNRS/Universite de Strasbourg - Bureau i101 > 7 rue Rene Descartes F-67084 Strasbourg Cedex, FRANCE. > tel : (33) [0]3 68 85 01 73, fax : (33) [0]3 68 85 01 05http://www-irma.u-strasbg.fr/~navaro > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecoon at lanl.gov Thu Feb 17 10:06:24 2011 From: ecoon at lanl.gov (Ethan Coon) Date: Thu, 17 Feb 2011 09:06:24 -0700 Subject: [petsc-users] general VecScatter from MPI to MPI Message-ID: <1297958784.14179.11.camel@echo.lanl.gov> So I thought I understood how VecScatters worked, but apparently not. Is it possible to create a general VecScatter from an arbitrarily partitioned (MPI) Vec to another arbitrarily partitioned (MPI) Vec with the same global sizes (or same global IS sizes) but different local sizes? Shouldn't this just be a matter of relying upon the implied LocalToGlobalMapping? See below snippet (and its errors): Ethan Vec vA Vec vB VecScatter scatter_AB PetscInt np PetscInt rank PetscErrorCode ierr if (rank.eq.0) np = 3 if (rank.eq.1) np = 1 call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, ierr) call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, ierr) call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, PETSC_NULL_OBJECT, scatter_AB, ierr) ... $> mpiexec -n 2 ./test [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Nonconforming object sizes! [0]PETSC ERROR: Local scatter sizes don't match! [0]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Nonconforming object sizes! [1]PETSC ERROR: Local scatter sizes don't match! [0]PETSC ERROR: Petsc Development HG revision: 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f HG Date: Fri Feb 11 15:44:04 2011 -0600 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu Feb 17 08:14:57 2011 [0]PETSC ERROR: Libraries linked from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011 [0]PETSC ERROR: Configure options --with-debugging=1 --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: VecScatterCreate() line 1432 in src/vec/vec/utils/vscat.c application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) -- ------------------------------------ Ethan Coon Post-Doctoral Researcher Applied Mathematics - T-5 Los Alamos National Laboratory 505-665-8289 http://www.ldeo.columbia.edu/~ecoon/ ------------------------------------ From knepley at gmail.com Thu Feb 17 10:35:37 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 10:35:37 -0600 Subject: [petsc-users] general VecScatter from MPI to MPI In-Reply-To: <1297958784.14179.11.camel@echo.lanl.gov> References: <1297958784.14179.11.camel@echo.lanl.gov> Message-ID: On Thu, Feb 17, 2011 at 10:06 AM, Ethan Coon wrote: > So I thought I understood how VecScatters worked, but apparently not. > Is it possible to create a general VecScatter from an arbitrarily > partitioned (MPI) Vec to another arbitrarily partitioned (MPI) Vec with > the same global sizes (or same global IS sizes) but different local > sizes? Shouldn't this just be a matter of relying upon the implied > LocalToGlobalMapping? > No, the way you have to do this is to map a global Vec to a bunch of sequential local Vecs with the sizes you want. This is also how we map to overlapping arrays. Matt > See below snippet (and its errors): > > Ethan > > > > Vec vA > Vec vB > VecScatter scatter_AB > > PetscInt np > PetscInt rank > PetscErrorCode ierr > > if (rank.eq.0) np = 3 > if (rank.eq.1) np = 1 > > call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, ierr) > call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, ierr) > > call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, PETSC_NULL_OBJECT, > scatter_AB, ierr) > > ... > > $> mpiexec -n 2 ./test > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Nonconforming object sizes! > [0]PETSC ERROR: Local scatter sizes don't match! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [1]PETSC ERROR: Nonconforming object sizes! > [1]PETSC ERROR: Local scatter sizes don't match! > [0]PETSC ERROR: Petsc Development HG revision: > 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f HG Date: Fri Feb 11 15:44:04 > 2011 -0600 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu Feb 17 > 08:14:57 2011 > [0]PETSC ERROR: Libraries linked > from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib > [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011 > [0]PETSC ERROR: Configure options --with-debugging=1 > --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared > --download-mpich=1 --download-ml=1 --download-umfpack=1 > --with-blas-lapack-dir=/usr/lib --download-parmetis=yes > PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 > --with-shared-libraries=1 --download-hdf5=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: VecScatterCreate() line 1432 in > src/vec/vec/utils/vscat.c > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: Hangup > (signal 1) > > > > > -- > ------------------------------------ > Ethan Coon > Post-Doctoral Researcher > Applied Mathematics - T-5 > Los Alamos National Laboratory > 505-665-8289 > > http://www.ldeo.columbia.edu/~ecoon/ > ------------------------------------ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominik at itis.ethz.ch Thu Feb 17 10:37:35 2011 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Thu, 17 Feb 2011 17:37:35 +0100 Subject: [petsc-users] custom compiler flags on Windows Message-ID: I need to use some special compile flags when compiling with 'cl' on Windows. While configuring I currently use --with-cxx='win32fe cl', which works fine, but if I add some flags after cl the configure brakes, complaining that the compiler does not work. I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the same as before. Is there a way to specify my own flags with Petsc (or add to them)? Best regards, Dominik From knepley at gmail.com Thu Feb 17 10:59:44 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 10:59:44 -0600 Subject: [petsc-users] custom compiler flags on Windows In-Reply-To: References: Message-ID: On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba wrote: > I need to use some special compile flags when compiling with 'cl' on > Windows. > While configuring I currently use --with-cxx='win32fe cl', which works > fine, but if I add some flags after cl the configure brakes, > complaining that the compiler does not work. > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the > same as before. > Is there a way to specify my own flags with Petsc (or add to them)? > --COPTFLAGS="" --FOPTFLAGS="" --CXXOPTFLAGS="" Matt > Best regards, > Dominik -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecoon at lanl.gov Thu Feb 17 11:18:27 2011 From: ecoon at lanl.gov (Ethan Coon) Date: Thu, 17 Feb 2011 10:18:27 -0700 Subject: [petsc-users] general VecScatter from MPI to MPI In-Reply-To: References: <1297958784.14179.11.camel@echo.lanl.gov> Message-ID: <1297963107.14179.24.camel@echo.lanl.gov> On Thu, 2011-02-17 at 10:35 -0600, Matthew Knepley wrote: > On Thu, Feb 17, 2011 at 10:06 AM, Ethan Coon wrote: > So I thought I understood how VecScatters worked, but > apparently not. > Is it possible to create a general VecScatter from an > arbitrarily > partitioned (MPI) Vec to another arbitrarily partitioned (MPI) > Vec with > the same global sizes (or same global IS sizes) but different > local > sizes? Shouldn't this just be a matter of relying upon the > implied > LocalToGlobalMapping? > > > No, the way you have to do this is to map a global Vec to a bunch of > sequential local Vecs with the sizes you want. This is also how we map > to overlapping arrays. > So effectively I need two scatters -- a scatter from the global Vec to the sequential local Vecs, then a scatter (which requires no communication) to inject the sequential Vecs into the new global Vec? Why? Am I missing something that makes the MPI to MPI scatter ill-posed as long as the global sizes (but not local sizes) are equal? This is mostly curiosity on my part... I think I have to do two scatters anyway since I'm working with multiple comms -- scatter from an MPI Vec on one sub-comm into local, sequential Vecs, then scatter those sequential Vecs into an MPI Vec on PETSC_COMM_WORLD. That's the correct model for injecting an MPI Vec on one comm into an MPI Vec on PETSC_COMM_WORLD, correct? Ethan > > Matt > > See below snippet (and its errors): > > Ethan > > > > Vec vA > Vec vB > VecScatter scatter_AB > > PetscInt np > PetscInt rank > PetscErrorCode ierr > > if (rank.eq.0) np = 3 > if (rank.eq.1) np = 1 > > call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, > ierr) > call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, > ierr) > > call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, > PETSC_NULL_OBJECT, > scatter_AB, ierr) > > ... > > $> mpiexec -n 2 ./test > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Nonconforming object sizes! > [0]PETSC ERROR: Local scatter sizes don't match! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [1]PETSC ERROR: Nonconforming object sizes! > [1]PETSC ERROR: Local scatter sizes don't match! > [0]PETSC ERROR: Petsc Development HG revision: > 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f HG Date: Fri Feb 11 > 15:44:04 > 2011 -0600 > [0]PETSC ERROR: See docs/changes/index.html for recent > updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble > shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu > Feb 17 > 08:14:57 2011 > [0]PETSC ERROR: Libraries linked > from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib > [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011 > [0]PETSC ERROR: Configure options --with-debugging=1 > --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: VecScatterCreate() line 1432 in > src/vec/vec/utils/vscat.c > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: > Hangup > (signal 1) > > > > > -- > ------------------------------------ > Ethan Coon > Post-Doctoral Researcher > Applied Mathematics - T-5 > Los Alamos National Laboratory > 505-665-8289 > > http://www.ldeo.columbia.edu/~ecoon/ > ------------------------------------ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ------------------------------------ Ethan Coon Post-Doctoral Researcher Applied Mathematics - T-5 Los Alamos National Laboratory 505-665-8289 http://www.ldeo.columbia.edu/~ecoon/ ------------------------------------ From knepley at gmail.com Thu Feb 17 11:22:46 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 11:22:46 -0600 Subject: [petsc-users] general VecScatter from MPI to MPI In-Reply-To: <1297963107.14179.24.camel@echo.lanl.gov> References: <1297958784.14179.11.camel@echo.lanl.gov> <1297963107.14179.24.camel@echo.lanl.gov> Message-ID: On Thu, Feb 17, 2011 at 11:18 AM, Ethan Coon wrote: > On Thu, 2011-02-17 at 10:35 -0600, Matthew Knepley wrote: > > On Thu, Feb 17, 2011 at 10:06 AM, Ethan Coon wrote: > > So I thought I understood how VecScatters worked, but > > apparently not. > > Is it possible to create a general VecScatter from an > > arbitrarily > > partitioned (MPI) Vec to another arbitrarily partitioned (MPI) > > Vec with > > the same global sizes (or same global IS sizes) but different > > local > > sizes? Shouldn't this just be a matter of relying upon the > > implied > > LocalToGlobalMapping? > > > > > > No, the way you have to do this is to map a global Vec to a bunch of > > sequential local Vecs with the sizes you want. This is also how we map > > to overlapping arrays. > > > > So effectively I need two scatters -- a scatter from the global Vec to > the sequential local Vecs, then a scatter (which requires no > communication) to inject the sequential Vecs into the new global Vec? > No, just wrap up the pieces of your global Vec as local Vecs and scatter straight into that storage using VecCreateSeqWithArray(). Matt > Why? Am I missing something that makes the MPI to MPI scatter ill-posed > as long as the global sizes (but not local sizes) are equal? > > This is mostly curiosity on my part... I think I have to do two scatters > anyway since I'm working with multiple comms -- scatter from an MPI Vec > on one sub-comm into local, sequential Vecs, then scatter those > sequential Vecs into an MPI Vec on PETSC_COMM_WORLD. That's the correct > model for injecting an MPI Vec on one comm into an MPI Vec on > PETSC_COMM_WORLD, correct? > > Ethan > > > > > Matt > > > > See below snippet (and its errors): > > > > Ethan > > > > > > > > Vec vA > > Vec vB > > VecScatter scatter_AB > > > > PetscInt np > > PetscInt rank > > PetscErrorCode ierr > > > > if (rank.eq.0) np = 3 > > if (rank.eq.1) np = 1 > > > > call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, > > ierr) > > call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, > > ierr) > > > > call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, > > PETSC_NULL_OBJECT, > > scatter_AB, ierr) > > > > ... > > > > $> mpiexec -n 2 ./test > > > > [0]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [0]PETSC ERROR: Nonconforming object sizes! > > [0]PETSC ERROR: Local scatter sizes don't match! > > [0]PETSC ERROR: > > > ------------------------------------------------------------------------ > > [1]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [1]PETSC ERROR: Nonconforming object sizes! > > [1]PETSC ERROR: Local scatter sizes don't match! > > [0]PETSC ERROR: Petsc Development HG revision: > > 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f HG Date: Fri Feb 11 > > 15:44:04 > > 2011 -0600 > > [0]PETSC ERROR: See docs/changes/index.html for recent > > updates. > > [0]PETSC ERROR: See docs/faq.html for hints about trouble > > shooting. > > [0]PETSC ERROR: See docs/index.html for manual pages. > > [0]PETSC ERROR: > > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu > > Feb 17 > > 08:14:57 2011 > > [0]PETSC ERROR: Libraries linked > > from > /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib > > [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011 > > [0]PETSC ERROR: Configure options --with-debugging=1 > > > --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared > --download-mpich=1 --download-ml=1 --download-umfpack=1 > --with-blas-lapack-dir=/usr/lib --download-parmetis=yes > PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 > --with-shared-libraries=1 --download-hdf5=1 > > [0]PETSC ERROR: > > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: VecScatterCreate() line 1432 in > > src/vec/vec/utils/vscat.c > > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > > [cli_0]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > > [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: > > Hangup > > (signal 1) > > > > > > > > > > -- > > ------------------------------------ > > Ethan Coon > > Post-Doctoral Researcher > > Applied Mathematics - T-5 > > Los Alamos National Laboratory > > 505-665-8289 > > > > http://www.ldeo.columbia.edu/~ecoon/ > > ------------------------------------ > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > -- > ------------------------------------ > Ethan Coon > Post-Doctoral Researcher > Applied Mathematics - T-5 > Los Alamos National Laboratory > 505-665-8289 > > http://www.ldeo.columbia.edu/~ecoon/ > ------------------------------------ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecoon at lanl.gov Thu Feb 17 11:31:06 2011 From: ecoon at lanl.gov (Ethan Coon) Date: Thu, 17 Feb 2011 10:31:06 -0700 Subject: [petsc-users] general VecScatter from MPI to MPI In-Reply-To: References: <1297958784.14179.11.camel@echo.lanl.gov> <1297963107.14179.24.camel@echo.lanl.gov> Message-ID: <1297963866.14179.25.camel@echo.lanl.gov> > > So effectively I need two scatters -- a scatter from the > global Vec to > the sequential local Vecs, then a scatter (which requires no > communication) to inject the sequential Vecs into the new > global Vec? > > > No, just wrap up the pieces of your global Vec as local Vecs and > scatter > straight into that storage using VecCreateSeqWithArray(). > Ah ha! Thanks, Ethan > > Matt > > Why? Am I missing something that makes the MPI to MPI scatter > ill-posed > as long as the global sizes (but not local sizes) are equal? > > This is mostly curiosity on my part... I think I have to do > two scatters > anyway since I'm working with multiple comms -- scatter from > an MPI Vec > on one sub-comm into local, sequential Vecs, then scatter > those > sequential Vecs into an MPI Vec on PETSC_COMM_WORLD. That's > the correct > model for injecting an MPI Vec on one comm into an MPI Vec on > PETSC_COMM_WORLD, correct? > > Ethan > > > > > > Matt > > > > See below snippet (and its errors): > > > > Ethan > > > > > > > > Vec vA > > Vec vB > > VecScatter scatter_AB > > > > PetscInt np > > PetscInt rank > > PetscErrorCode ierr > > > > if (rank.eq.0) np = 3 > > if (rank.eq.1) np = 1 > > > > call VecCreateMPI(PETSC_COMM_WORLD, 2, > PETSC_DETERMINE, vA, > > ierr) > > call VecCreateMPI(PETSC_COMM_WORLD, np, > PETSC_DETERMINE, vB, > > ierr) > > > > call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, > > PETSC_NULL_OBJECT, > > scatter_AB, ierr) > > > > ... > > > > $> mpiexec -n 2 ./test > > > > [0]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [0]PETSC ERROR: Nonconforming object sizes! > > [0]PETSC ERROR: Local scatter sizes don't match! > > [0]PETSC ERROR: > > > ------------------------------------------------------------------------ > > [1]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [1]PETSC ERROR: Nonconforming object sizes! > > [1]PETSC ERROR: Local scatter sizes don't match! > > [0]PETSC ERROR: Petsc Development HG revision: > > 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f HG Date: > Fri Feb 11 > > 15:44:04 > > 2011 -0600 > > [0]PETSC ERROR: See docs/changes/index.html for > recent > > updates. > > [0]PETSC ERROR: See docs/faq.html for hints about > trouble > > shooting. > > [0]PETSC ERROR: See docs/index.html for manual > pages. > > [0]PETSC ERROR: > > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: ./test on a linux-gnu named tama1 by > ecoon Thu > > Feb 17 > > 08:14:57 2011 > > [0]PETSC ERROR: Libraries linked > > > from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib > > [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 > 2011 > > [0]PETSC ERROR: Configure options --with-debugging=1 > > > --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1 > > [0]PETSC ERROR: > > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: VecScatterCreate() line 1432 in > > src/vec/vec/utils/vscat.c > > application called MPI_Abort(MPI_COMM_WORLD, 60) - > process 0 > > [cli_0]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 60) - > process 0 > > [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT > STRING: > > Hangup > > (signal 1) > > > > > > > > > > -- > > ------------------------------------ > > Ethan Coon > > Post-Doctoral Researcher > > Applied Mathematics - T-5 > > Los Alamos National Laboratory > > 505-665-8289 > > > > http://www.ldeo.columbia.edu/~ecoon/ > > ------------------------------------ > > > > > > > > > > -- > > What most experimenters take for granted before they begin > their > > experiments is infinitely more interesting than any results > to which > > their experiments lead. > > -- Norbert Wiener > > > -- > > ------------------------------------ > Ethan Coon > Post-Doctoral Researcher > Applied Mathematics - T-5 > Los Alamos National Laboratory > 505-665-8289 > > http://www.ldeo.columbia.edu/~ecoon/ > ------------------------------------ > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ------------------------------------ Ethan Coon Post-Doctoral Researcher Applied Mathematics - T-5 Los Alamos National Laboratory 505-665-8289 http://www.ldeo.columbia.edu/~ecoon/ ------------------------------------ From balay at mcs.anl.gov Thu Feb 17 12:00:51 2011 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 17 Feb 2011 12:00:51 -0600 (CST) Subject: [petsc-users] custom compiler flags on Windows In-Reply-To: References: Message-ID: On Thu, 17 Feb 2011, Matthew Knepley wrote: > On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba wrote: > > > I need to use some special compile flags when compiling with 'cl' on > > Windows. > > While configuring I currently use --with-cxx='win32fe cl', which works > > fine, but if I add some flags after cl the configure brakes, > > complaining that the compiler does not work. > > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the > > same as before. > > Is there a way to specify my own flags with Petsc (or add to them)? > > > > --COPTFLAGS="" --FOPTFLAGS="" --CXXOPTFLAGS="" Generally CFLAGS should work. However with MS compilers - we have some defaults without which the compilers might not work. [esp with mpi]. So when changing CFLAGS one might have to include the defaults plus the additional flags. However COPTFLAGS migh be easier to add to CFLAGS - and provided to primarily specify optimization flags - but can be used for for other flags aswell.. Satish From dominik at itis.ethz.ch Thu Feb 17 12:07:22 2011 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Thu, 17 Feb 2011 19:07:22 +0100 Subject: [petsc-users] custom compiler flags on Windows In-Reply-To: References: Message-ID: Many thanks for your explanations! On Thu, Feb 17, 2011 at 7:00 PM, Satish Balay wrote: > On Thu, 17 Feb 2011, Matthew Knepley wrote: > >> On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba wrote: >> >> > I need to use some special compile flags when compiling with 'cl' on >> > Windows. >> > While configuring I currently use --with-cxx='win32fe cl', which works >> > fine, but if I add some flags after cl the configure brakes, >> > complaining that the compiler does not work. >> > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the >> > same as before. >> > Is there a way to specify my own flags with Petsc (or add to them)? >> > >> >> --COPTFLAGS="" --FOPTFLAGS="" --CXXOPTFLAGS="" > > Generally CFLAGS should work. However with MS compilers - we have some > defaults without which the compilers might not work. [esp with mpi]. > So when changing CFLAGS one might have to include the defaults plus > the additional flags. > > However COPTFLAGS migh be easier to add to CFLAGS - and provided to > primarily specify optimization flags - but can be used for for other > flags aswell.. > > Satish > > From mgrabbani at gmail.com Thu Feb 17 12:22:14 2011 From: mgrabbani at gmail.com (Golam Rabbani) Date: Thu, 17 Feb 2011 10:22:14 -0800 Subject: [petsc-users] Removing petsc In-Reply-To: References: Message-ID: Thanks for reply. I have not used different PETSC_ARCH; in all the installations it has the same value of linux-gnu-c-debug, but the different installations are in completely separate folders and extra packages like lapack, mpi are also installed separately. Can I delete a folder containing a full installation and not affect my system? Thanks Golam On Wed, Feb 16, 2011 at 7:20 PM, Barry Smith wrote: > > If you just used different PETSC_ARCH variables like arch-complex arch-opt > etc you can just delete those directories and that will remove everything > related to them. > > Barry > > On Feb 16, 2011, at 9:18 PM, Golam Rabbani wrote: > > > Hi, > > > > I am new to petsc and also new to linux/unix environment. > > Recently I started converting a matlab nanodevice simulaiton code into a > c code using petsc and have done most of the converting. > > But in the process I have installed petsc 4/5 times with different > configuration options (real, complex, with blopex, etc). Now that I > > do not need some of them, I want to uninstall them. Can I simply delete > the unwanted versions (all the versions are in separate folders)? > > Please advise me if something different is needed. > > > > Thanks in advance, > > Golam > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Thu Feb 17 12:26:20 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 17 Feb 2011 19:26:20 +0100 Subject: [petsc-users] Removing petsc In-Reply-To: References: Message-ID: On Thu, Feb 17, 2011 at 19:22, Golam Rabbani wrote: > Can I delete a folder containing a full installation and not affect my > system? Yes. For future reference, it is easier to stay current if you use different values of PETSC_ARCH instead of different PETSC_DIR. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrabbani at gmail.com Thu Feb 17 12:45:47 2011 From: mgrabbani at gmail.com (Golam Rabbani) Date: Thu, 17 Feb 2011 10:45:47 -0800 Subject: [petsc-users] Removing petsc In-Reply-To: References: Message-ID: Thanks, I will keep your advice in mind. On Thu, Feb 17, 2011 at 10:26 AM, Jed Brown wrote: > On Thu, Feb 17, 2011 at 19:22, Golam Rabbani wrote: > >> Can I delete a folder containing a full installation and not affect my >> system? > > > Yes. > > For future reference, it is easier to stay current if you use different > values of PETSC_ARCH instead of different PETSC_DIR. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Thu Feb 17 15:21:43 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Thu, 17 Feb 2011 21:21:43 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <201102171408.40609.juhaj@iki.fi> Message-ID: <201102172121.46026.juhaj@iki.fi> > Yes, if your BC do not give at least a locally unique solution, then your > Jacobian will > be rank deficient and Newton breaks down. You can still try Picard, but I > recommend > understanding what you mean by a solution first. Thanks for confirming. And for the suggestion to try Picard, but it simply shoots out to somewhere in the direction of Alpha Centauri or some such: reaches function values in excess of 1.e+34 in less than ten SNES iterations... Perhaps there is such a solution, but that is certainly not what I want. Especially since I think I figured out an alternative boundary condition. But I do not know how to implement that in PETSc. How do I require f'(xmax) = constant_A f(xmax) = constant_B and no condition (I could require f(xmin)=0, but that is exactly the non- condition I discovered) at xmin? I did not find any examples of how to do this and it does not seem to be straightforward. Do I need to convert from f, f', f'' to f, g, g' with g=f' to change the f'(xmax) condition to a Dirichlet one for g? But that does not seem to be feasible since I can not think of what equation g (or f') should obey in the interior - recall that I just have a single equation, F(f, f', f'') = 0. Cheers, -Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From bsmith at mcs.anl.gov Thu Feb 17 15:41:57 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 17 Feb 2011 15:41:57 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102172121.46026.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102171408.40609.juhaj@iki.fi> <201102172121.46026.juhaj@iki.fi> Message-ID: <9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov> On boundary points where you want your mathematical solution x*| at that point = a you need to use for your coded function f(x) = x - a. Its derivative is f'(x) = 1 which is nonzero is fine. If the derivative at other points is order K you can use f(x) = K*(x - a) so the derivate at that point is K. Barry On Feb 17, 2011, at 3:21 PM, Juha J?ykk? wrote: >> Yes, if your BC do not give at least a locally unique solution, then your >> Jacobian will >> be rank deficient and Newton breaks down. You can still try Picard, but I >> recommend >> understanding what you mean by a solution first. > > Thanks for confirming. And for the suggestion to try Picard, but it simply > shoots out to somewhere in the direction of Alpha Centauri or some such: > reaches function values in excess of 1.e+34 in less than ten SNES > iterations... Perhaps there is such a solution, but that is certainly not what > I want. > > Especially since I think I figured out an alternative boundary condition. But > I do not know how to implement that in PETSc. > > How do I require > > f'(xmax) = constant_A > f(xmax) = constant_B > > and no condition (I could require f(xmin)=0, but that is exactly the non- > condition I discovered) at xmin? > > I did not find any examples of how to do this and it does not seem to be > straightforward. Do I need to convert from f, f', f'' to f, g, g' with g=f' to > change the f'(xmax) condition to a Dirichlet one for g? But that does not seem > to be feasible since I can not think of what equation g (or f') should obey in > the interior - recall that I just have a single equation, F(f, f', f'') = 0. > > Cheers, > -Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- From bsmith at mcs.anl.gov Thu Feb 17 15:58:06 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 17 Feb 2011 15:58:06 -0600 Subject: [petsc-users] general VecScatter from MPI to MPI In-Reply-To: <1297958784.14179.11.camel@echo.lanl.gov> References: <1297958784.14179.11.camel@echo.lanl.gov> Message-ID: <5DE8FC2C-CE4A-42CC-AD1A-D86375730DAC@mcs.anl.gov> Ethan, It is perfectly possible to map from one global MPI vector to another global MPI vector. The vectors can be different or the same sizes and have different or the same layouts. It is just not possible to use DEFAULT is in the from and two positions at the same time. Reason: if you don't provide either it doesn't have a way of generating both that are compatible with each other. I will add an error check for that case. You should just generate the IS's that you need. Ignore Matt's ravings, you don't need to wrapping nothing in no SeqVec to scatter from MPI to MPI. Just provide the ISs. Barry On Feb 17, 2011, at 10:06 AM, Ethan Coon wrote: > So I thought I understood how VecScatters worked, but apparently not. > Is it possible to create a general VecScatter from an arbitrarily > partitioned (MPI) Vec to another arbitrarily partitioned (MPI) Vec with > the same global sizes (or same global IS sizes) but different local > sizes? Shouldn't this just be a matter of relying upon the implied > LocalToGlobalMapping? > > See below snippet (and its errors): > > Ethan > > > > Vec vA > Vec vB > VecScatter scatter_AB > > PetscInt np > PetscInt rank > PetscErrorCode ierr > > if (rank.eq.0) np = 3 > if (rank.eq.1) np = 1 > > call VecCreateMPI(PETSC_COMM_WORLD, 2, PETSC_DETERMINE, vA, ierr) > call VecCreateMPI(PETSC_COMM_WORLD, np, PETSC_DETERMINE, vB, ierr) > > call VecScatterCreate(vA, PETSC_NULL_OBJECT, vB, PETSC_NULL_OBJECT, > scatter_AB, ierr) > > ... > > $> mpiexec -n 2 ./test > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Nonconforming object sizes! > [0]PETSC ERROR: Local scatter sizes don't match! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [1]PETSC ERROR: Nonconforming object sizes! > [1]PETSC ERROR: Local scatter sizes don't match! > [0]PETSC ERROR: Petsc Development HG revision: > 5dbe1264252fb9cb5d8e033d620d18f7b0e9111f HG Date: Fri Feb 11 15:44:04 > 2011 -0600 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./test on a linux-gnu named tama1 by ecoon Thu Feb 17 > 08:14:57 2011 > [0]PETSC ERROR: Libraries linked > from /packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared/lib > [0]PETSC ERROR: Configure run at Fri Feb 11 16:15:14 2011 > [0]PETSC ERROR: Configure options --with-debugging=1 > --prefix=/packages/petsc/petsc-dev3.0-mpich2-local-gcc-4.3.3/debug-shared --download-mpich=1 --download-ml=1 --download-umfpack=1 --with-blas-lapack-dir=/usr/lib --download-parmetis=yes PETSC_ARCH=linux-gnu-c-debug-shared --with-clanguage=c --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: VecScatterCreate() line 1432 in > src/vec/vec/utils/vscat.c > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 60) - process 0 > [1]PETSC ERROR: APPLICATION TERMINATED WITH THE EXIT STRING: Hangup > (signal 1) > > > > > -- > ------------------------------------ > Ethan Coon > Post-Doctoral Researcher > Applied Mathematics - T-5 > Los Alamos National Laboratory > 505-665-8289 > > http://www.ldeo.columbia.edu/~ecoon/ > ------------------------------------ > From juhaj at iki.fi Thu Feb 17 17:01:07 2011 From: juhaj at iki.fi (Juha =?iso-8859-1?q?J=E4ykk=E4?=) Date: Thu, 17 Feb 2011 23:01:07 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov> References: <201102161417.09649.juhaj@iki.fi> <201102172121.46026.juhaj@iki.fi> <9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov> Message-ID: <201102172301.16497.juhaj@iki.fi> > On boundary points where you want your mathematical solution x*| at that > point = a you need to use for your coded function f(x) = x - a. Its > derivative is f'(x) = 1 which is nonzero is fine. If the derivative at > other points is order K you can use f(x) = K*(x - a) so the derivate at > that point is K. I am not sure, I understood this. Just to make sure there is no confusion with the notation, my unknown function be called f and my independent variable x and f is defined for 0 <= x <= 1. I use f' for the derivative of f. The nonlinear equation I want to solve is F(f,f',f'',x)=0. So, if I want f(1) = a and f'(1) = b, should I set the F(1) = b*(f-a) in the code? Will that not give 0 residual when f(1)=a regardless of it derivative? Or, alternatively, is my approach totally wrong to begin with? I took a step back and started to work with r f''/f - r (f'/f)^2 + f'/f = 0 only and cannot get it to converge any more than my actual problem. Now, for this I even know the general solution, so it should be easy to solve this for f(1)=1, f'(1)=2 (or 1/2, but that has singular derivative at 0, so perhaps it is not a good example). Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From gaurish108 at gmail.com Thu Feb 17 21:10:44 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Thu, 17 Feb 2011 22:10:44 -0500 Subject: [petsc-users] Least squares: using unpreconditioned LSQR Message-ID: Hi, I wanted to solve some least squares problems using PETSc. My test matrix is size 3x2 but I wish to use this code for solving large ill-conditioned rectangular systems later. Looking at the PETSc manual I the found the KSPLSQR routine which implements the LSQR algorithm. However I am unsure how to use this routine. I am pasting the lines of the code which I use to set up the solver. Through the terminal I pass the option -ksp_type lsqr while running exectuable. ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); ierr = KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); As you can see I have used PETSC_NULL for the preconditioner matrix since I wish to use the *unpreconditioned* version of the LSQR algorithm. This gives me an error. If I pass the matrix A it gives me an error again. I am not sure how to tell PETSc not to use a preconditioner. Could you please tell me how I should use KSPSetOperators statement in this case to use the unpreconditioned algorithm. If you have a better sparse matrix least squares algorithm implemented please let me know. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhyshr at mcs.anl.gov Thu Feb 17 21:19:42 2011 From: abhyshr at mcs.anl.gov (Shri) Date: Thu, 17 Feb 2011 21:19:42 -0600 (CST) Subject: [petsc-users] Least squares: using unpreconditioned LSQR In-Reply-To: Message-ID: <1632480610.70640.1297999182044.JavaMail.root@zimbra.anl.gov> Use the option -pc_type none. ----- Original Message ----- Hi, I wanted to solve some least squares problems using PETSc. My test matrix is size 3x2 but I wish to use this code for solving large ill-conditioned rectangular systems later. Looking at the PETSc manual I the found the KSPLSQR routine which implements the LSQR algorithm. However I am unsure how to use this routine. I am pasting the lines of the code which I use to set up the solver. Through the terminal I pass the option -ksp_type lsqr while running exectuable. ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); ierr = KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); As you can see I have used PETSC_NULL for the preconditioner matrix since I wish to use the *unpreconditioned* version of the LSQR algorithm. This gives me an error. If I pass the matrix A it gives me an error again. I am not sure how to tell PETSc not to use a preconditioner. Could you pease tell me how I should use KSPSetOperators statement in this case to use the unpreconditioned algorithm. If you have a better sparse matrix least squares algorithm implemented please let me know. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 17 21:20:47 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Feb 2011 21:20:47 -0600 Subject: [petsc-users] Least squares: using unpreconditioned LSQR In-Reply-To: References: Message-ID: On Thu, Feb 17, 2011 at 9:10 PM, Gaurish Telang wrote: > Hi, > > I wanted to solve some least squares problems using PETSc. My test matrix > is size 3x2 but I wish to use this code for solving large ill-conditioned > rectangular systems later. > > Looking at the PETSc manual I the found the KSPLSQR routine which > implements the LSQR algorithm. > > However I am unsure how to use this routine. I am pasting the lines of the > code which I use to set up the solver. > > Through the terminal I pass the option -ksp_type lsqr while running > exectuable. > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > > ierr = > KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); > Pass A, not PETSC_NULL. Matt > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > > ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > As you can see I have used PETSC_NULL for the preconditioner matrix since I > wish to use the *unpreconditioned* version of the LSQR algorithm. This gives > me an error. > > If I pass the matrix A it gives me an error again. I am not sure how to > tell PETSc not to use a preconditioner. > > Could you please tell me how I should use KSPSetOperators statement in this > case to use the unpreconditioned algorithm. > > If you have a better sparse matrix least squares algorithm implemented > please let me know. > > > > > > > > > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Feb 17 22:01:50 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Thu, 17 Feb 2011 22:01:50 -0600 Subject: [petsc-users] Least squares: using unpreconditioned LSQR In-Reply-To: References: Message-ID: You can use '-ksp_type lsqr -pc_type none' Hong On Thu, Feb 17, 2011 at 9:10 PM, Gaurish Telang wrote: > Hi, > > I wanted to solve some least squares problems using PETSc. My test matrix is > size 3x2 but I wish to use this code for solving large ill-conditioned > rectangular systems later. > > Looking at the PETSc manual I the found the KSPLSQR routine which implements > the LSQR algorithm. > > However I am unsure how to use this routine. I am pasting the lines of the > code which I use to set up the solver. > > Through the terminal I pass the option -ksp_type lsqr while running > exectuable. > > ?ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > > ?ierr = > KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); > > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > > ?ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > ?ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > As you can see I have used PETSC_NULL for the preconditioner matrix since I > wish to use the *unpreconditioned* version of the LSQR algorithm. This gives > me an error. > > If I pass the matrix A it gives me an error again. I am not sure how to tell > PETSc not to use a preconditioner. > > Could you please tell me how I should use KSPSetOperators statement in this > case to use the unpreconditioned algorithm. > > If you have a better sparse matrix least squares algorithm implemented > please let me know. > > > > > > > > > > > > > > > > > > > > From gaurish108 at gmail.com Thu Feb 17 22:40:50 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Thu, 17 Feb 2011 23:40:50 -0500 Subject: [petsc-users] Least squares: using unpreconditioned LSQR In-Reply-To: References: Message-ID: Thank you. On using A in place of Pmat position in the KSPSetOperators list of arguments, I was able to get the small test system to work. On Thu, Feb 17, 2011 at 10:20 PM, Matthew Knepley wrote: > On Thu, Feb 17, 2011 at 9:10 PM, Gaurish Telang wrote: > >> Hi, >> >> I wanted to solve some least squares problems using PETSc. My test matrix >> is size 3x2 but I wish to use this code for solving large ill-conditioned >> rectangular systems later. >> >> Looking at the PETSc manual I the found the KSPLSQR routine which >> implements the LSQR algorithm. >> >> However I am unsure how to use this routine. I am pasting the lines of the >> code which I use to set up the solver. >> >> Through the terminal I pass the option -ksp_type lsqr while running >> exectuable. >> >> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >> >> ierr = >> KSPSetOperators(ksp,A,PETSC_NULL,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); >> > > Pass A, not PETSC_NULL. > > Matt > > >> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >> >> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); >> >> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); >> >> As you can see I have used PETSC_NULL for the preconditioner matrix since >> I wish to use the *unpreconditioned* version of the LSQR algorithm. This >> gives me an error. >> >> If I pass the matrix A it gives me an error again. I am not sure how to >> tell PETSc not to use a preconditioner. >> >> Could you please tell me how I should use KSPSetOperators statement in >> this case to use the unpreconditioned algorithm. >> >> If you have a better sparse matrix least squares algorithm implemented >> please let me know. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurish108 at gmail.com Fri Feb 18 01:26:18 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Fri, 18 Feb 2011 02:26:18 -0500 Subject: [petsc-users] Running an iterative method for a large number of iterations: Possible blow up? Message-ID: Hi, I was trying to use LSQR algorithm for solving a least squares problem of size 2683x1274. I notice that if I allow the iterative method to run for a large number of iterations after it has converged (i.e. output of -ksp_monitor KSPresidualnorm seems constant upto the 4th digit) , some numbers in the answer vector seem to get inordinately large. I seem to get my answer comparable to Matlab after 951 iterations, but when I increase the number of iterations to 10000 some numbers seem very large. Is this expected? Also, how do I terminate my iteration when my residual norm seems constant? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdiso at ustc.edu Fri Feb 18 02:26:08 2011 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 18 Feb 2011 16:26:08 +0800 Subject: [petsc-users] Is it possible to free extra memory after mat assemble? Message-ID: <59E77951AFD2405782C315F25FF16932@cogendaeda> Hi, After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much. Is it possible to add a function to reassemble the AIJ matrix to free the extra memory? Or it has already done when MatAssembly is called? From loic.gouarin at math.u-psud.fr Fri Feb 18 03:22:48 2011 From: loic.gouarin at math.u-psud.fr (gouarin) Date: Fri, 18 Feb 2011 10:22:48 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS Message-ID: <4D5E3A68.6020805@math.u-psud.fr> Hi, I want to solve 3D Stokes problem with 4Q1/Q1 finite element discretization. I have done a parallel version and it works for very small size. But it's very slow. I use DA to create my grid. Here is an example DACreate3d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_BOX, nv, nv, nv, PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE, 4,2,PETSC_NULL,PETSC_NULL,PETSC_NULL,&da); The matrix problem has the form [ A B1] [ B2 0 ] and the preconditioner is [ A 0] [ 0 M] I use fgmres to solve my system and I want to use MUMPS to solve the linear system Ax=b for the preconditioning step. I want to solve multiple time this problem with different second member. The first problem is when I call Mat A, B; DAGetMatrix(da, MATAIJ, &A); DAGetMatrix(da, MATAIJ, &B); It takes a long time in 3D and I don't understand why. I keep the debug version of Petsc but i don't think that it is the problem. After that MUMPS can't do the factorization for nv=33 in my grid definition (the case nv=19 works) because there is not enough memory. I have the error [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: Cannot allocate required memory 1083974153 megabytes But my problem is not very big for the moment! I want to be able to solve Stokes problem on a grid 128x128x128 for the velocity field. Here is my script to launch the program mpiexec -np 1 ./stokesPart \ -stokes_ksp_type fgmres \ -stokes_ksp_rtol 1e-6 \ -stokes_pc_type fieldsplit \ -stokes_pc_fieldsplit_block_size 4 \ -stokes_pc_fieldsplit_0_fields 0,1,2 \ -stokes_pc_fieldsplit_1_fields 3 \ -stokes_fieldsplit_0_ksp_type preonly \ -stokes_fieldsplit_0_pc_type lu \ -stokes_fieldsplit_0_pc_factor_mat_solver_package mumps\ -stokes_fieldsplit_1_ksp_type preonly \ -stokes_fieldsplit_1_pc_type jacobi \ -stokes_ksp_monitor_short I compile Petsc-3.1-p7 with the options --with-mpi4py=1 --download-mpi4py=yes --with-petsc4py=1 --download-petsc4py=yes --with-shared --with-dynamic --with-hypre=1 --download-hypre=yes --with-ml=1 --download-ml=yes --with-mumps=1 --download-mumps=yes --with-parmetis=1 --download-parmetis=yes --with-prometheus=1 --download-prometheus=yes --with-scalapack=1 --download-scalapack=yes --with-blacs=1 --download-blacs=yes I think I have to put some MUMPS options but I don't know exactly what. Could you tell me what I do wrong? Best Regards, Loic -------------- next part -------------- A non-text attachment was scrubbed... Name: loic_gouarin.vcf Type: text/x-vcard Size: 551 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Fri Feb 18 03:49:09 2011 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 18 Feb 2011 10:49:09 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: <4D5E3A68.6020805@math.u-psud.fr> References: <4D5E3A68.6020805@math.u-psud.fr> Message-ID: Hey Loic, I think the problem is clear from the error message. You simply don't have enough memory to perform the LU decomposition. >From the info you provide I can see a numerous places here your code is using significant amounts of memory (without even considering the LU factorisation). 1) The DA you create uses a stencil width of 2. This is actually not required for your element type. Stencil width of one is sufficient. 2) DAGetMatrix is allocating memory of the (2,2) block (the zero matrix) in your stokes system. 3) The precondioner matrix B created with DAGetMatrix is allocating memory of the off-diagonal blocks (1,2) and (2,1) which you don't use. 4) Fieldsplit (additive) is copying the matrices associated with the (1,1) and (2,2) block from the preconditioner. Representing this complete element type of the DA is not a good idea, however representing the velocity space and pressure space on different DA's is fine. Doing this would allow a stencil width of one to be used for both velocity and pressure - which is that is required. You can connect the two DA's for velocity and pressure via DMComposite. Unfortunately, DMComposite cannot create and preallocate the off-diagonal matrices for you, but it can create and preallocate memory for the diagonal blocks. You would have to provide the preallocation routine for the off-diagonal blocks. I would recommend switching to petsc-dev as there is much more support for this type of "multi-physics" coupling. I doubt you will ever be able to solve your 128^3 problem using MUMPS to factor your (1,1) block. The memory required is simply to great, you will have to consider using a multilevel preconditioner. Can you solve your problem using ML or BoomerAMG? Cheers, Dave On 18 February 2011 10:22, gouarin wrote: > Hi, > > I want to solve 3D Stokes problem with 4Q1/Q1 finite element discretization. > I have done a parallel version and it works for very small size. But it's > very slow. > > I use DA to create my grid. Here is an example > > DACreate3d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_BOX, > ? ? ? ? ? ? ? ? ? nv, nv, nv, PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE, > ? ? ? ? ? ? ? ? ? 4,2,PETSC_NULL,PETSC_NULL,PETSC_NULL,&da); > > The matrix problem has the form > > [ A ? ?B1] > [ B2 ? 0 ] > > and the preconditioner is > > [ A ?0] > [ 0 ?M] > > I use fgmres to solve my system and I want to use MUMPS to solve the linear > system Ax=b for the preconditioning step. > I want to solve multiple time this problem with different second member. > > The first problem is when I call > > ?Mat A, B; > ?DAGetMatrix(da, MATAIJ, &A); > ?DAGetMatrix(da, MATAIJ, &B); > > It takes a long time in 3D and I don't understand why. I keep the debug > version of Petsc but i don't think that it is the problem. > > After that MUMPS can't do the factorization for nv=33 in my grid definition > (the case nv=19 works) because there is not enough memory. I have the error > > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > Cannot allocate required memory 1083974153 megabytes > > But my problem is not very big for the moment! I want to be able to solve > Stokes problem on a grid 128x128x128 for the velocity field. > > Here is my script to launch the program > > mpiexec -np 1 ./stokesPart \ > ? ? -stokes_ksp_type fgmres \ > ? ? -stokes_ksp_rtol 1e-6 \ > ? ? -stokes_pc_type fieldsplit \ > ? ? -stokes_pc_fieldsplit_block_size 4 \ > ? ? -stokes_pc_fieldsplit_0_fields 0,1,2 \ > ? ? -stokes_pc_fieldsplit_1_fields 3 \ > ? ? -stokes_fieldsplit_0_ksp_type preonly \ > ? ? -stokes_fieldsplit_0_pc_type lu \ > ? ? -stokes_fieldsplit_0_pc_factor_mat_solver_package mumps\ > ? ? -stokes_fieldsplit_1_ksp_type preonly \ > ? ? -stokes_fieldsplit_1_pc_type jacobi \ > ? ? -stokes_ksp_monitor_short > > I compile Petsc-3.1-p7 with the options > > --with-mpi4py=1 --download-mpi4py=yes --with-petsc4py=1 > --download-petsc4py=yes --with-shared --with-dynamic --with-hypre=1 > --download-hypre=yes --with-ml=1 --download-ml=yes --with-mumps=1 > --download-mumps=yes --with-parmetis=1 --download-parmetis=yes > --with-prometheus=1 --download-prometheus=yes --with-scalapack=1 > --download-scalapack=yes --with-blacs=1 --download-blacs=yes > > I think I have to put some MUMPS options but I don't know exactly what. > > Could you tell me what I do wrong? > > Best Regards, > > Loic > From loic.gouarin at math.u-psud.fr Fri Feb 18 04:45:08 2011 From: loic.gouarin at math.u-psud.fr (gouarin) Date: Fri, 18 Feb 2011 11:45:08 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: References: <4D5E3A68.6020805@math.u-psud.fr> Message-ID: <4D5E4DB4.5010706@math.u-psud.fr> Hi Dave, thanks for your quick reply. On 18/02/2011 10:49, Dave May wrote: > Hey Loic, > I think the problem is clear from the error message. > You simply don't have enough memory to perform the LU decomposition. > > > From the info you provide I can see a numerous places here your code > is using significant amounts of memory (without even considering the > LU factorisation). > > 1) The DA you create uses a stencil width of 2. This is actually not > required for your element type. Stencil width of one is sufficient. > > 2) DAGetMatrix is allocating memory of the (2,2) block (the zero > matrix) in your stokes system. > > 3) The precondioner matrix B created with DAGetMatrix is allocating > memory of the off-diagonal blocks (1,2) and (2,1) which you don't use. > > 4) Fieldsplit (additive) is copying the matrices associated with the > (1,1) and (2,2) block from the preconditioner. > > > Representing this complete element type of the DA is not a good idea, > however representing the velocity space and pressure space on > different DA's is fine. Doing this would allow a stencil width of one > to be used for both velocity and pressure - which is that is required. > You can connect the two DA's for velocity and pressure via > DMComposite. Unfortunately, DMComposite cannot create and preallocate > the off-diagonal matrices for you, but it can create and preallocate > memory for the diagonal blocks. You would have to provide the > preallocation routine for the off-diagonal blocks. > Ok. It's the first time that I use Petsc for a big problem and I don't yet see pretty well all the possibilities of Petsc. When you say: "You would have to provide the preallocation routine for the off-diagonal block", do you talk about DASetBlockFills or is it more complicated ? > I would recommend switching to petsc-dev as there is much more support > for this type of "multi-physics" coupling. > I see that DMDA is more advanced in petsc-dev. I'll try it. > I doubt you will ever be able to solve your 128^3 problem using MUMPS > to factor your (1,1) block. The memory required is simply to great, > you will have to consider using a multilevel preconditioner. > > Can you solve your problem using ML or BoomerAMG? > I tried different solvers. ML doesn't work. I used hypre with this script mpiexec -np 4 ./stokesPart \ -stokes_ksp_type minres \ -stokes_ksp_rtol 1e-6 \ -stokes_pc_type fieldsplit \ -stokes_pc_fieldsplit_block_size 4 \ -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE \ -stokes_pc_fieldsplit_0_fields 0,1,2 \ -stokes_pc_fieldsplit_1_fields 3 \ -stokes_fieldsplit_0_ksp_type richardson \ -stokes_fieldsplit_0_ksp_max_it 1 \ -stokes_fieldsplit_0_pc_type hypre \ -stokes_fieldsplit_0_pc_hypre_type boomeramg\ -stokes_fieldsplit_0_pc_hypre_boomeramg_max_iter 1 \ -stokes_fieldsplit_1_ksp_type preonly \ -stokes_fieldsplit_1_pc_type jacobi \ -stokes_ksp_monitor_short but once again, it's very slow or out of memory. Perhaps my options are not good ... The problem is that I don't have to solve Stokes problem just one time but multiple time so I have to do this as fast as possible. Thanks again, Loic > Cheers, > Dave > > > On 18 February 2011 10:22, gouarin wrote: >> Hi, >> >> I want to solve 3D Stokes problem with 4Q1/Q1 finite element discretization. >> I have done a parallel version and it works for very small size. But it's >> very slow. >> >> I use DA to create my grid. Here is an example >> >> DACreate3d(PETSC_COMM_WORLD,DA_NONPERIODIC,DA_STENCIL_BOX, >> nv, nv, nv, PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE, >> 4,2,PETSC_NULL,PETSC_NULL,PETSC_NULL,&da); >> >> The matrix problem has the form >> >> [ A B1] >> [ B2 0 ] >> >> and the preconditioner is >> >> [ A 0] >> [ 0 M] >> >> I use fgmres to solve my system and I want to use MUMPS to solve the linear >> system Ax=b for the preconditioning step. >> I want to solve multiple time this problem with different second member. >> >> The first problem is when I call >> >> Mat A, B; >> DAGetMatrix(da, MATAIJ,&A); >> DAGetMatrix(da, MATAIJ,&B); >> >> It takes a long time in 3D and I don't understand why. I keep the debug >> version of Petsc but i don't think that it is the problem. >> >> After that MUMPS can't do the factorization for nv=33 in my grid definition >> (the case nv=19 works) because there is not enough memory. I have the error >> >> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: >> Cannot allocate required memory 1083974153 megabytes >> >> But my problem is not very big for the moment! I want to be able to solve >> Stokes problem on a grid 128x128x128 for the velocity field. >> >> Here is my script to launch the program >> >> mpiexec -np 1 ./stokesPart \ >> -stokes_ksp_type fgmres \ >> -stokes_ksp_rtol 1e-6 \ >> -stokes_pc_type fieldsplit \ >> -stokes_pc_fieldsplit_block_size 4 \ >> -stokes_pc_fieldsplit_0_fields 0,1,2 \ >> -stokes_pc_fieldsplit_1_fields 3 \ >> -stokes_fieldsplit_0_ksp_type preonly \ >> -stokes_fieldsplit_0_pc_type lu \ >> -stokes_fieldsplit_0_pc_factor_mat_solver_package mumps\ >> -stokes_fieldsplit_1_ksp_type preonly \ >> -stokes_fieldsplit_1_pc_type jacobi \ >> -stokes_ksp_monitor_short >> >> I compile Petsc-3.1-p7 with the options >> >> --with-mpi4py=1 --download-mpi4py=yes --with-petsc4py=1 >> --download-petsc4py=yes --with-shared --with-dynamic --with-hypre=1 >> --download-hypre=yes --with-ml=1 --download-ml=yes --with-mumps=1 >> --download-mumps=yes --with-parmetis=1 --download-parmetis=yes >> --with-prometheus=1 --download-prometheus=yes --with-scalapack=1 >> --download-scalapack=yes --with-blacs=1 --download-blacs=yes >> >> I think I have to put some MUMPS options but I don't know exactly what. >> >> Could you tell me what I do wrong? >> >> Best Regards, >> >> Loic >> -- Loic Gouarin Laboratoire de Math?matiques Universit? Paris-Sud B?timent 425 91405 Orsay Cedex France Tel: (+33) 1 69 15 60 14 Fax: (+33) 1 69 15 67 18 From dave.mayhem23 at gmail.com Fri Feb 18 05:07:30 2011 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 18 Feb 2011 12:07:30 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: <4D5E4DB4.5010706@math.u-psud.fr> References: <4D5E3A68.6020805@math.u-psud.fr> <4D5E4DB4.5010706@math.u-psud.fr> Message-ID: > Ok. It's the first time that I use Petsc for a big problem and I don't yet > see pretty well all the possibilities of Petsc. > When you say: "You would have to provide the preallocation routine for the > off-diagonal block", do you talk about DASetBlockFills or is it more > complicated ? Not exactly. I meant if you use DMComposite, you would have to provide the preallocation routines for the off-diagonal blocks. If you continue to use a single to represent u,v,w,p, then I think DASetBlockFills() would let you control which chunks in the operator and preconditioner get allocated when you call DAGetMatrix. > I tried different solvers. ML doesn't work. I used hypre with this script > > mpiexec -np 4 ./stokesPart \ > ? ? -stokes_ksp_type minres \ > ? ? -stokes_ksp_rtol 1e-6 \ > ? ? -stokes_pc_type fieldsplit \ > ? ? -stokes_pc_fieldsplit_block_size 4 \ > ? ? -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE \ > ? ? -stokes_pc_fieldsplit_0_fields 0,1,2 \ > ? ? -stokes_pc_fieldsplit_1_fields 3 \ > ? ? -stokes_fieldsplit_0_ksp_type richardson \ > ? ? -stokes_fieldsplit_0_ksp_max_it 1 \ > ? ? -stokes_fieldsplit_0_pc_type hypre \ > ? ? -stokes_fieldsplit_0_pc_hypre_type boomeramg\ > ? ? -stokes_fieldsplit_0_pc_hypre_boomeramg_max_iter 1 \ > ? ? -stokes_fieldsplit_1_ksp_type preonly \ > ? ? -stokes_fieldsplit_1_pc_type jacobi \ > ? ? -stokes_ksp_monitor_short > > but once again, it's very slow or out of memory. Perhaps my options are not > good ... > How much memory is used when you use -stokes_fieldsplit_0_ksp_max_it 1 -stokes_fieldsplit_0_pc_type jacobi ? It's possible that the copy of the diagonal blocks occurring when you invoke Fieldsplit just by itself is using all your available memory. I wouldn't be surprised with a stencil width of 2.... From zonexo at gmail.com Fri Feb 18 05:20:16 2011 From: zonexo at gmail.com (TAY wee-beng) Date: Fri, 18 Feb 2011 12:20:16 +0100 Subject: [petsc-users] Re-zero sparse matrix and MatZeroEntries Message-ID: <4D5E55F0.8000407@gmail.com> Hi, I am trying to solve the Navier Stokes momentum equation of a moving body. For most points (a), I will be using the north/south/east/west locations to form the equation. However, for some points (b), due to the moving body, I will be using some interpolation schemes. At different time step, the interpolation template will be different for these points. Hence, I will use different neighboring points to form the equation. Moreover, points (a) can change to points (b) and vice versa. I wonder if I can use MatZeroEntries to re-zero the whole sparse matrix. But in the manual, it states that "For sparse matrices this routine retains the old nonzero structure. ". However for my case, the template is different at different time step. Hence what is the most efficient procedure? -- Yours sincerely, TAY wee-beng From knepley at gmail.com Fri Feb 18 05:44:59 2011 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Feb 2011 05:44:59 -0600 Subject: [petsc-users] Is it possible to free extra memory after mat assemble? In-Reply-To: <59E77951AFD2405782C315F25FF16932@cogendaeda> References: <59E77951AFD2405782C315F25FF16932@cogendaeda> Message-ID: 2011/2/18 Gong Ding > Hi, > After update my FVM code to support higher order, I have to preallocate > more memory when creating the matrix. However, only a few cells (determined > at run time) needed to be high order, thus preallocated memory is overkill > too much. > > Is it possible to add a function to reassemble the AIJ matrix to free the > extra memory? > Or it has already done when MatAssembly is called? > This is done during MatAssemblyEnd(). However, there is no guarantee that the operating system actually returns that memory to general use. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 18 05:57:11 2011 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Feb 2011 05:57:11 -0600 Subject: [petsc-users] Re-zero sparse matrix and MatZeroEntries In-Reply-To: <4D5E55F0.8000407@gmail.com> References: <4D5E55F0.8000407@gmail.com> Message-ID: On Fri, Feb 18, 2011 at 5:20 AM, TAY wee-beng wrote: > Hi, > > I am trying to solve the Navier Stokes momentum equation of a moving body. > > For most points (a), I will be using the north/south/east/west locations to > form the equation. > > However, for some points (b), due to the moving body, I will be using some > interpolation schemes. At different time step, the interpolation template > will be different for these points. Hence, I will use different neighboring > points to form the equation. > > Moreover, points (a) can change to points (b) and vice versa. > > I wonder if I can use MatZeroEntries to re-zero the whole sparse matrix. > But in the manual, it states that "For sparse matrices this routine retains > the old nonzero structure. ". However for my case, the template is different > at different time step. > > Hence what is the most efficient procedure? > There is no efficiently, updateable data structure in PETSc, since this would be much lower performance for general use. I suggest using a matrix-free application of your full operator, and a fixed sparsity operator for your preconditioner. Alternatively, you can rebuild the matrix structure at each step, which might be the best option depending on how much work you do in each solve. Matt > -- > Yours sincerely, > > TAY wee-beng > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From loic.gouarin at math.u-psud.fr Fri Feb 18 06:00:09 2011 From: loic.gouarin at math.u-psud.fr (gouarin) Date: Fri, 18 Feb 2011 13:00:09 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: References: <4D5E3A68.6020805@math.u-psud.fr> <4D5E4DB4.5010706@math.u-psud.fr> Message-ID: <4D5E5F49.9090001@math.u-psud.fr> On 18/02/2011 12:07, Dave May wrote: > > How much memory is used when you use > -stokes_fieldsplit_0_ksp_max_it 1 > -stokes_fieldsplit_0_pc_type jacobi > ? > It's possible that the copy of the diagonal blocks occurring when you > invoke Fieldsplit just by itself is using all your available memory. I > wouldn't be surprised with a stencil width of 2.... This is the memory info given by the log_summary for nv=19 Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0 Index Set 30 24 96544 0 IS L to G Mapping 4 0 0 0 Vec 46 17 338344 0 Vec Scatter 12 0 0 0 Matrix 22 0 0 0 Distributed array 2 0 0 0 Preconditioner 3 0 0 0 Krylov Solver 3 0 0 0 ======================================================================================================================== and the malloc_info ------------------------------------------ [0] Maximum memory PetscMalloc()ed 184348608 maximum size of entire process 225873920 [0] Memory usage sorted by function [0] 2 3216 ClassPerfLogCreate() [0] 2 1616 ClassRegLogCreate() [0] 6 9152 DACreate() [0] 17 114128 DACreate_3D() [0] 3 48 DAGetCoordinateDA() [0] 10 265632 DAGetMatrix3d_MPIAIJ() [0] 3 48 DASetVertexDivision() [0] 2 6416 EventPerfLogCreate() [0] 1 12800 EventPerfLogEnsureSize() [0] 2 1616 EventRegLogCreate() [0] 1 3200 EventRegLogRegister() [0] 12 329376 ISAllGather() [0] 50 89344 ISCreateBlock() [0] 25 354768 ISCreateGeneral() [0] 60 7920 ISCreateStride() [0] 12 161728 ISGetIndices_Stride() [0] 2 21888 ISLocalToGlobalMappingBlock() [0] 2 21888 ISLocalToGlobalMappingCreate() [0] 12 1728 ISLocalToGlobalMappingCreateNC() [0] 9 2544 KSPCreate() [0] 1 16 KSPCreate_MINRES() [0] 1 16 KSPCreate_Richardson() [0] 3 48 KSPDefaultConvergedCreate() [0] 66 41888 MatCreate() [0] 6 960 MatCreate_MPIAIJ() [0] 16 5632 MatCreate_SeqAIJ() [0] 4 12000 MatGetRow_MPIAIJ() [0] 4 64 MatGetSubMatrices_MPIAIJ() [0] 160 941760 MatGetSubMatrices_MPIAIJ_Local() [0] 4 121664 MatGetSubMatrix_MPIAIJ_Private() [0] 16 304000 MatMarkDiagonal_SeqAIJ() [0] 80 181061344 MatSeqAIJSetPreallocation_SeqAIJ() [0] 12 113792 MatSetUpMultiply_MPIAIJ() [0] 12 288 MatStashCreate_Private() [0] 50 864 MatStashScatterBegin_Private() [0] 120 108096 MatZeroRows_MPIAIJ() [0] 10 182560 Mat_CheckInode() [0] 9 1776 PCCreate() [0] 1 144 PCCreate_FieldSplit() [0] 2 64 PCCreate_Jacobi() [0] 4 192 PCFieldSplitSetFields_FieldSplit() [0] 1 16 PCSetFromOptions_FieldSplit() [0] 5 22864 PCSetUp_FieldSplit() [0] 4 64 PetscCommDuplicate() [0] 1 4112 PetscDLLibraryOpen() [0] 6 24576 PetscDLLibraryRetrieve() [0] 45 1712 PetscDLLibrarySym() [0] 579 27792 PetscFListAdd() [0] 48 2112 PetscGatherMessageLengths() [0] 52 832 PetscGatherNumberOfMessages() [0] 90 4320 PetscLayoutCreate() [0] 64 1392 PetscLayoutSetUp() [0] 4 64 PetscLogPrintSummary() [0] 12 384 PetscMaxSum() [0] 24 6528 PetscOListAdd() [0] 28 1792 PetscObjectSetState() [0] 8 192 PetscOptionsGetEList() [0] 16 4842288 PetscPostIrecvInt() [0] 12 4842224 PetscPostIrecvScalar() [0] 0 32 PetscPushSignalHandler() [0] 1 432 PetscStackCreate() [0] 1798 54816 PetscStrallocpy() [0] 30 248832 PetscStrreplace() [0] 2 45888 PetscTableAdd() [0] 24 446528 PetscTableCreate() [0] 3 96 PetscTokenCreate() [0] 1 16 PetscViewerASCIIMonitorCreate() [0] 1 16 PetscViewerASCIIOpen() [0] 3 496 PetscViewerCreate() [0] 1 64 PetscViewerCreate_ASCII() [0] 2 528 StackCreate() [0] 2 1008 StageLogCreate() [0] 6 14400 User provided function() [0] 138 58880 VecCreate() [0] 66 1401952 VecCreate_MPI_Private() [0] 7 221312 VecCreate_Seq() [0] 9 288 VecCreate_Seq_Private() [0] 6 160 VecDuplicateVecs_Default() [0] 3 3008 VecGetArray3d() [0] 42 49536 VecScatterCreate() [0] 16 512 VecScatterCreateCommon_PtoS() [0] 20 213024 VecScatterCreate_PtoP() [0] 252 881536 VecScatterCreate_PtoS() [0] 74 1184 VecStashCreate_Private() -- Loic Gouarin Laboratoire de Math?matiques Universit? Paris-Sud B?timent 425 91405 Orsay Cedex France Tel: (+33) 1 69 15 60 14 Fax: (+33) 1 69 15 67 18 From gdiso at ustc.edu Fri Feb 18 06:04:51 2011 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 18 Feb 2011 20:04:51 +0800 Subject: [petsc-users] Is it possible to free extra memory after matassemble? References: <59E77951AFD2405782C315F25FF16932@cogendaeda> Message-ID: <0DC3F71BF33F4DF7A5C1897C0D21B7C2@cogendaeda> ----- Original Message ----- From: "Matthew Knepley" To: "PETSc users list" Sent: Friday, February 18, 2011 7:44 PM Subject: Re: [petsc-users] Is it possible to free extra memory after matassemble? > 2011/2/18 Gong Ding > >> Hi, >> After update my FVM code to support higher order, I have to preallocate >> more memory when creating the matrix. However, only a few cells (determined >> at run time) needed to be high order, thus preallocated memory is overkill >> too much. >> >> Is it possible to add a function to reassemble the AIJ matrix to free the >> extra memory? >> Or it has already done when MatAssembly is called? >> > > This is done during MatAssemblyEnd(). However, there is no guarantee that > the operating system > actually returns that memory to general use. > > Matt Could you please point out where can I find the MatAssemblyEnd routine for sequence AIJ matrix? I'd like to take a look at it. From knepley at gmail.com Fri Feb 18 06:27:22 2011 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Feb 2011 06:27:22 -0600 Subject: [petsc-users] Is it possible to free extra memory after matassemble? In-Reply-To: <0DC3F71BF33F4DF7A5C1897C0D21B7C2@cogendaeda> References: <59E77951AFD2405782C315F25FF16932@cogendaeda> <0DC3F71BF33F4DF7A5C1897C0D21B7C2@cogendaeda> Message-ID: On Fri, Feb 18, 2011 at 6:04 AM, Gong Ding wrote: > > ----- Original Message ----- > From: "Matthew Knepley" > To: "PETSc users list" > Sent: Friday, February 18, 2011 7:44 PM > Subject: Re: [petsc-users] Is it possible to free extra memory after > matassemble? > > > > 2011/2/18 Gong Ding > > > >> Hi, > >> After update my FVM code to support higher order, I have to preallocate > >> more memory when creating the matrix. However, only a few cells > (determined > >> at run time) needed to be high order, thus preallocated memory is > overkill > >> too much. > >> > >> Is it possible to add a function to reassemble the AIJ matrix to free > the > >> extra memory? > >> Or it has already done when MatAssembly is called? > >> > > > > This is done during MatAssemblyEnd(). However, there is no guarantee that > > the operating system > > actually returns that memory to general use. > > > > Matt > > Could you please point out where can I find the MatAssemblyEnd routine for > sequence AIJ matrix? > I'd like to take a look at it. > http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/mat/impls/aij/seq/aij.c.html#MatAssemblyEnd_SeqAIJ Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdiso at ustc.edu Fri Feb 18 06:29:18 2011 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 18 Feb 2011 20:29:18 +0800 Subject: [petsc-users] Is it possible to free extra memory after matassemble? References: <59E77951AFD2405782C315F25FF16932@cogendaeda> Message-ID: <77C896F9110B4134905B37D0ED5E87FB@cogendaeda> ----- Original Message ----- From: "Matthew Knepley" To: "PETSc users list" Sent: Friday, February 18, 2011 7:44 PM Subject: Re: [petsc-users] Is it possible to free extra memory after matassemble? > 2011/2/18 Gong Ding > >> Hi, >> After update my FVM code to support higher order, I have to preallocate >> more memory when creating the matrix. However, only a few cells (determined >> at run time) needed to be high order, thus preallocated memory is overkill >> too much. >> >> Is it possible to add a function to reassemble the AIJ matrix to free the >> extra memory? >> Or it has already done when MatAssembly is called? >> > > This is done during MatAssemblyEnd(). However, there is no guarantee that > the operating system > actually returns that memory to general use. > > Matt I had checked the function MatAssemblyEnd_SeqAIJ in aij.c. It seems it only pack the a, i, j array, but didn't free memory. I guess one should malloc three new array with exact size and copy values to the new one, and then free the old a, i, j array? From knepley at gmail.com Fri Feb 18 06:57:48 2011 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Feb 2011 06:57:48 -0600 Subject: [petsc-users] Is it possible to free extra memory after matassemble? In-Reply-To: <77C896F9110B4134905B37D0ED5E87FB@cogendaeda> References: <59E77951AFD2405782C315F25FF16932@cogendaeda> <77C896F9110B4134905B37D0ED5E87FB@cogendaeda> Message-ID: On Fri, Feb 18, 2011 at 6:29 AM, Gong Ding wrote: > > ----- Original Message ----- > From: "Matthew Knepley" > To: "PETSc users list" > Sent: Friday, February 18, 2011 7:44 PM > Subject: Re: [petsc-users] Is it possible to free extra memory after > matassemble? > > > > 2011/2/18 Gong Ding > > > >> Hi, > >> After update my FVM code to support higher order, I have to preallocate > >> more memory when creating the matrix. However, only a few cells > (determined > >> at run time) needed to be high order, thus preallocated memory is > overkill > >> too much. > >> > >> Is it possible to add a function to reassemble the AIJ matrix to free > the > >> extra memory? > >> Or it has already done when MatAssembly is called? > >> > > > > This is done during MatAssemblyEnd(). However, there is no guarantee that > > the operating system > > actually returns that memory to general use. > > > > Matt > > I had checked the function MatAssemblyEnd_SeqAIJ in aij.c. > It seems it only pack the a, i, j array, but didn't free memory. > I guess one should malloc three new array with exact size and copy values > to the new one, and then free the old a, i, j array? > If you want that, just do MatCopy(). Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Feb 18 08:00:12 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 08:00:12 -0600 Subject: [petsc-users] Running an iterative method for a large number of iterations: Possible blow up? In-Reply-To: References: Message-ID: Are you using petsc-dev? If not you should switch to it, it has an additional convergence test based on the residual of the normal equations. Barry One possible explanation for the large values is that they are in the null space of the operator and though they don't increase the residual norm the solution just accumulates them after a large number of iterations. On Feb 18, 2011, at 1:26 AM, Gaurish Telang wrote: > Hi, > > I was trying to use LSQR algorithm for solving a least squares problem of size 2683x1274. I notice that if I allow the iterative method to run for a large number of iterations > after it has converged (i.e. output of -ksp_monitor KSPresidualnorm seems constant upto the 4th digit) , some numbers in the answer vector seem to get inordinately large. > > > I seem to get my answer comparable to Matlab after 951 iterations, but when I increase the number of iterations to 10000 some numbers seem very large. > > Is this expected? Also, how do I terminate my iteration when my residual norm seems constant? > > Thanks From bsmith at mcs.anl.gov Fri Feb 18 08:04:13 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 08:04:13 -0600 Subject: [petsc-users] Re-zero sparse matrix and MatZeroEntries In-Reply-To: References: <4D5E55F0.8000407@gmail.com> Message-ID: <068DC357-A149-4166-8F76-B94F5A93EE8A@mcs.anl.gov> On Feb 18, 2011, at 5:57 AM, Matthew Knepley wrote: > On Fri, Feb 18, 2011 at 5:20 AM, TAY wee-beng wrote: > Hi, > > I am trying to solve the Navier Stokes momentum equation of a moving body. > > For most points (a), I will be using the north/south/east/west locations to form the equation. > > However, for some points (b), due to the moving body, I will be using some interpolation schemes. At different time step, the interpolation template will be different for these points. Hence, I will use different neighboring points to form the equation. > > Moreover, points (a) can change to points (b) and vice versa. > > I wonder if I can use MatZeroEntries to re-zero the whole sparse matrix. But in the manual, it states that "For sparse matrices this routine retains the old nonzero structure. ". However for my case, the template is different at different time step. > > Hence what is the most efficient procedure? > > There is no efficiently, updateable data structure in PETSc, since this would be much lower performance for general use. > > I suggest using a matrix-free application of your full operator, and a fixed sparsity operator for your preconditioner. Alternatively, > you can rebuild the matrix structure at each step, which might be the best option depending on how much work you do in each solve. Here Matt means simply create a new Mat each time the nonzero structure will change and properly preallocate it each time. The additional cost of the new creation is at most a few percent of a run and the code is easier to maintain. Barry > > Matt > > -- > Yours sincerely, > > TAY wee-beng > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From bsmith at mcs.anl.gov Fri Feb 18 08:11:38 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 08:11:38 -0600 Subject: [petsc-users] Is it possible to free extra memory after mat assemble? In-Reply-To: <59E77951AFD2405782C315F25FF16932@cogendaeda> References: <59E77951AFD2405782C315F25FF16932@cogendaeda> Message-ID: <259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov> On Feb 18, 2011, at 2:26 AM, Gong Ding wrote: > Hi, > After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much. You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much? The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time, creating a new matrix with the proper preallocation is not an expensive operation. Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating. Barry > > Is it possible to add a function to reassemble the AIJ matrix to free the extra memory? > Or it has already done when MatAssembly is called? > > From gdiso at ustc.edu Fri Feb 18 08:20:23 2011 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 18 Feb 2011 22:20:23 +0800 Subject: [petsc-users] Is it possible to free extra memory after matassemble? References: <59E77951AFD2405782C315F25FF16932@cogendaeda> <259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov> Message-ID: Hi, I had added following code to aij.c in the function MatAssemblyEnd_SeqAIJ after pack the matrix elements. Just allocate three new array and copy data to them. This skill is heavily used in macro MatSeqXAIJReallocateAIJ if( a->maxnz != a->nz ) { ierr = PetscMalloc3(a->nz,MatScalar,&new_a,a->nz,PetscInt,&new_j,A->rmap->n+1,PetscInt,&new_i);CHKERRQ(ierr); ierr = PetscMemcpy(new_a,a->a,a->nz*sizeof(MatScalar));CHKERRQ(ierr); ierr = PetscMemcpy(new_i,a->i,(A->rmap->n+1)*sizeof(PetscInt));CHKERRQ(ierr); ierr = PetscMemcpy(new_j,a->j,a->nz*sizeof(PetscInt));CHKERRQ(ierr); ierr = MatSeqXAIJFreeAIJ(A,&a->a,&a->j,&a->i);CHKERRQ(ierr); a->a = new_a; a->i = new_i; a->j = new_j; a->maxnz = a->nz; } It seems work well. However, Barry, do you think it has some problem? On Feb 18, 2011, at 2:26 AM, Gong Ding wrote: > Hi, > After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much. You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much? The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time, creating a new matrix with the proper preallocation is not an expensive operation. Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating. Barry > > Is it possible to add a function to reassemble the AIJ matrix to free the extra memory? > Or it has already done when MatAssembly is called? > > From bsmith at mcs.anl.gov Fri Feb 18 08:23:29 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 08:23:29 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102172301.16497.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <201102172121.46026.juhaj@iki.fi> <9803FEA2-DAB1-41EF-A432-F6C684D19A89@mcs.anl.gov> <201102172301.16497.juhaj@iki.fi> Message-ID: <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov> I don't know how to handle the f'(1) = b. I was always taught to first introduce new variables to reduce the problem to a first order equation. For example let g = f' and the new problem is F(f,g,g') = 0 with the additional equations g = f' now there are no second derivatives. Barry On Feb 17, 2011, at 5:01 PM, Juha J?ykk? wrote: >> On boundary points where you want your mathematical solution x*| at that >> point = a you need to use for your coded function f(x) = x - a. Its >> derivative is f'(x) = 1 which is nonzero is fine. If the derivative at >> other points is order K you can use f(x) = K*(x - a) so the derivate at >> that point is K. > > I am not sure, I understood this. Just to make sure there is no confusion with > the notation, my unknown function be called f and my independent variable x > and f is defined for 0 <= x <= 1. I use f' for the derivative of f. The > nonlinear equation I want to solve is F(f,f',f'',x)=0. > > So, if I want f(1) = a and f'(1) = b, should I set the F(1) = b*(f-a) in the > code? Will that not give 0 residual when f(1)=a regardless of it derivative? > > Or, alternatively, is my approach totally wrong to begin with? I took a step > back and started to work with > > r f''/f - r (f'/f)^2 + f'/f = 0 > > only and cannot get it to converge any more than my actual problem. Now, for > this I even know the general solution, so it should be easy to solve this for > f(1)=1, f'(1)=2 (or 1/2, but that has singular derivative at 0, so perhaps it > is not a good example). > > Cheers, > Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- From bsmith at mcs.anl.gov Fri Feb 18 08:26:23 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 08:26:23 -0600 Subject: [petsc-users] Is it possible to free extra memory after matassemble? In-Reply-To: References: <59E77951AFD2405782C315F25FF16932@cogendaeda> <259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov> Message-ID: On Feb 18, 2011, at 8:20 AM, Gong Ding wrote: > Hi, > I had added following code to aij.c in the function MatAssemblyEnd_SeqAIJ after pack the matrix elements. > Just allocate three new array and copy data to them. > This skill is heavily used in macro MatSeqXAIJReallocateAIJ > if( a->maxnz != a->nz ) { > ierr = PetscMalloc3(a->nz,MatScalar,&new_a,a->nz,PetscInt,&new_j,A->rmap->n+1,PetscInt,&new_i);CHKERRQ(ierr); > ierr = PetscMemcpy(new_a,a->a,a->nz*sizeof(MatScalar));CHKERRQ(ierr); > ierr = PetscMemcpy(new_i,a->i,(A->rmap->n+1)*sizeof(PetscInt));CHKERRQ(ierr); > ierr = PetscMemcpy(new_j,a->j,a->nz*sizeof(PetscInt));CHKERRQ(ierr); > ierr = MatSeqXAIJFreeAIJ(A,&a->a,&a->j,&a->i);CHKERRQ(ierr); > a->a = new_a; > a->i = new_i; > a->j = new_j; > a->maxnz = a->nz; > } > > It seems work well. However, Barry, do you think it has some problem? Yes 1) you still have the HUGE grabbing of memory during the initial allocation of the matrix (sometimes with virtual memory this may not hurt you but sometimes it will). 2) processes rarely actually return memory they've malloced back to the underlying OS so the program is still sitting on all that memory (sometimes because of virtual memory this may not hurt you). Admittedly this is much easier than doing the preallocation correctly so if it works for you then great. Barry > > > On Feb 18, 2011, at 2:26 AM, Gong Ding wrote: > >> Hi, >> After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much. > > You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much? > > The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time, creating a new matrix with the proper preallocation is not an expensive operation. > > Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating. > > Barry > > >> >> Is it possible to add a function to reassemble the AIJ matrix to free the extra memory? >> Or it has already done when MatAssembly is called? >> >> > From gdiso at ustc.edu Fri Feb 18 08:31:29 2011 From: gdiso at ustc.edu (Gong Ding) Date: Fri, 18 Feb 2011 22:31:29 +0800 Subject: [petsc-users] Is it possible to free extra memory after matassemble? References: <59E77951AFD2405782C315F25FF16932@cogendaeda> <259A9E2B-5C07-4A4E-896B-F5FC59E6C1DC@mcs.anl.gov> Message-ID: Thank Barry, but at the moment I only want to loop cells once. The "loop" operation is very time consuming because I use AD to build the Jacobian matrix and there are millons of cells. Anyway, I'd like to allocate enough memory and free extra memory before solving the matrix (usually by direct solver such as MUMPS). > Hi, > After update my FVM code to support higher order, I have to preallocate more memory when creating the matrix. However, only a few cells (determined at run time) needed to be high order, thus preallocated memory is overkill too much. You should preallocate ONLY the space you need so that only the space you need is ever allocated. In Unix/Windows there is no good way to recover memory. Why is the preallocated memory too much? The way to do this is each time the matrix nonzero structure will change. (1) loop over you cells determining what needs to be "high order" and determine the number of nonzeros for each row, the (2) preallocate the matrix, then (3) loop over cells again building the actual sparse matrix entries and putting them in. If the matrix nonzero structure changes each time step then just delete the old matrix and generate a new one each time, creating a new matrix with the proper preallocation is not an expensive operation. Yes it may seem inefficient to loop over the cells twice but you will find that it actually works very well, saves tons of memory and is much much faster than not preallocating. Barry > > Is it possible to add a function to reassemble the AIJ matrix to free the extra memory? > Or it has already done when MatAssembly is called? > > From fpacull at fluorem.com Fri Feb 18 09:22:09 2011 From: fpacull at fluorem.com (francois pacull) Date: Fri, 18 Feb 2011 16:22:09 +0100 Subject: [petsc-users] ILU, ASM, GMRES and memory Message-ID: <4D5E8EA1.8090809@fluorem.com> Dear PETSc team, I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat. So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory: 1 - the local rows of Amat (ksp&pc's linear system matrix) 2 - the local rows of Pmat (ksp&pc's precond matrix) 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix) 4 - the incomplete factorization of P[i] (subpc's ILU matrix) Is it correct? If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors? When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed? From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used? Thanks for your help, francois. From jdbst21 at gmail.com Fri Feb 18 09:36:12 2011 From: jdbst21 at gmail.com (Joshua Booth) Date: Fri, 18 Feb 2011 10:36:12 -0500 Subject: [petsc-users] Mumps Message-ID: Hello, I have been have problems using Mumps on a large (O(1M) sparse martrix) on multiple cores. I first used the Petsc interface but it would hang sometime. Therefore, I wrote it now using MUMPS's C interface. The code works for one or two cores, but again hangs in the factorization stage using 4 cores. Note: That mumps, BLASC, and Scalpack was installed with petsc using --download Compiled using intel 11.1 If anyone has any ideal, I would really appreciate some feedback. Thank you Josh -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Feb 18 09:42:29 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 09:42:29 -0600 Subject: [petsc-users] ILU, ASM, GMRES and memory In-Reply-To: <4D5E8EA1.8090809@fluorem.com> References: <4D5E8EA1.8090809@fluorem.com> Message-ID: On Feb 18, 2011, at 9:22 AM, francois pacull wrote: > Dear PETSc team, > > I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat. > > So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory: > 1 - the local rows of Amat (ksp&pc's linear system matrix) > 2 - the local rows of Pmat (ksp&pc's precond matrix) > 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix) > 4 - the incomplete factorization of P[i] (subpc's ILU matrix) > > Is it correct? > If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors? You will need to "hack" slightly to get the affect. Edit src/ksp/pc/impls/asm/asm.c and add a new function PCASMFreeSpace(PC pc) { PC_ASM *osm = (PC_ASM*)pc->data; PetscErrorCode ierr; if (osm->pmat) { if (osm->n_local_true > 0) { ierr = MatDestroyMatrices(osm->n_local_true,&osm->pmat);CHKERRQ(ierr); } osm->pmat = 0; } if (pc->pmat) {ierr = MatDestroy(pc->pmat);CHKERRQ(ierr); pc->pmat = 0;} return 0; } run make in that directory. Now call this routine in your program AFTER calling KSPSetUp() and KSPSetUpOnBlocks() or SNESSetUp() but before KSPSolve() or SNESSolve(). report any problems to petsc-maint at mcs.anl.gov > When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed? Yes, all PETSc objects are reference counted and the KSP object keeps a reference to Pmat (actually the PC underneath keeps the reference.) > From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used? Under those conditions the space in P[i] is reused for the factor, thus saving the space of the "incomplete factorization of P" > > Thanks for your help, > francois. > > From dominik at itis.ethz.ch Fri Feb 18 09:43:56 2011 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Fri, 18 Feb 2011 16:43:56 +0100 Subject: [petsc-users] custom compiler flags on Windows In-Reply-To: References: Message-ID: I have manage to smuggle my options along with COPTFLAGS, but not CFLAGS. The latter seems ignored (cygwin + windows). I tried exporting them as shell variables as well as attaching them to the command line before ./configure - in either case they are not to be found in configure.log. Minor issue (just multiple compiler warnings about overwritten switches), but if you still have a clean solution I would be glad to learn it for future. Many thanks and regards, Dominik > On Thu, Feb 17, 2011 at 7:00 PM, Satish Balay wrote: >> On Thu, 17 Feb 2011, Matthew Knepley wrote: >> >>> On Thu, Feb 17, 2011 at 10:37 AM, Dominik Szczerba wrote: >>> >>> > I need to use some special compile flags when compiling with 'cl' on >>> > Windows. >>> > While configuring I currently use --with-cxx='win32fe cl', which works >>> > fine, but if I add some flags after cl the configure brakes, >>> > complaining that the compiler does not work. >>> > I also tried using --with-cxx='cl /MY /OPTIONS' but the result is the >>> > same as before. >>> > Is there a way to specify my own flags with Petsc (or add to them)? >>> > >>> >>> --COPTFLAGS="" --FOPTFLAGS="" --CXXOPTFLAGS="" >> >> Generally CFLAGS should work. However with MS compilers - we have some >> defaults without which the compilers might not work. [esp with mpi]. >> So when changing CFLAGS one might have to include the defaults plus >> the additional flags. >> >> However COPTFLAGS migh be easier to add to CFLAGS - and provided to >> primarily specify optimization flags - but can be used for for other >> flags aswell.. >> >> Satish >> >> > From knepley at gmail.com Fri Feb 18 09:58:21 2011 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Feb 2011 09:58:21 -0600 Subject: [petsc-users] Mumps In-Reply-To: References: Message-ID: On Fri, Feb 18, 2011 at 9:36 AM, Joshua Booth wrote: > Hello, > > I have been have problems using Mumps on a large (O(1M) sparse martrix) on > multiple cores. I first used the Petsc interface but it would hang > sometime. > Therefore, I wrote it now using MUMPS's C interface. > The code works for one or two cores, but again hangs in the factorization > stage using 4 cores. > Have you verified that this is actually hanging (in PETSc and MUMPS) and not just slow? These factorizations can have a large amount of fill and thus can get very slow and use a lot of memory for matrices this large. Matt > Note: That mumps, BLASC, and Scalpack was installed with petsc using > --download > Compiled using intel 11.1 > > If anyone has any ideal, I would really appreciate some feedback. > > Thank you > > Josh > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Feb 18 10:00:22 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Fri, 18 Feb 2011 10:00:22 -0600 Subject: [petsc-users] Mumps In-Reply-To: References: Message-ID: Joshua : > > I have been have problems using Mumps on a large (O(1M) sparse martrix) on > multiple cores.? I first used the Petsc interface but it would hang > sometime. "Petsc interface"? Do you mean Petsc LU sequential solver or petsc-mumps interface? > Therefore, I wrote it now using MUMPS's C interface. > The code works for one or two cores, but again hangs in the factorization > stage using 4 cores. Do you mean using MUMPS's C interface without petsc? > > Note:? That mumps, BLASC, and Scalpack was installed with petsc using > --download > Compiled using intel 11.1 > > If anyone has any ideal, I would really appreciate some feedback. > You need figure out where it hangs. Does it hang on smaller size problems? I rarely see hang from MUMPS. Try increasing fill ratio -mat_mumps_icntl_14 <20>: ICNTL(14): percentage of estimated workspace increase (None) Hong Hong > Thank you > > Josh > From bsmith at mcs.anl.gov Fri Feb 18 10:21:44 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 10:21:44 -0600 Subject: [petsc-users] ILU, ASM, GMRES and memory In-Reply-To: References: <4D5E8EA1.8090809@fluorem.com> Message-ID: <633AC404-2940-4F82-8A18-2E5195D68F29@mcs.anl.gov> You will still need to delete your reference to the pmat by calling MatDestroy(pmat); after you have called KSPSetOperators(). Barry On Feb 18, 2011, at 9:42 AM, Barry Smith wrote: > > On Feb 18, 2011, at 9:22 AM, francois pacull wrote: > >> Dear PETSc team, >> >> I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat. >> >> So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory: >> 1 - the local rows of Amat (ksp&pc's linear system matrix) >> 2 - the local rows of Pmat (ksp&pc's precond matrix) >> 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix) >> 4 - the incomplete factorization of P[i] (subpc's ILU matrix) >> >> Is it correct? >> If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors? > > You will need to "hack" slightly to get the affect. Edit src/ksp/pc/impls/asm/asm.c and add a new function > > PCASMFreeSpace(PC pc) > { > PC_ASM *osm = (PC_ASM*)pc->data; > PetscErrorCode ierr; > > if (osm->pmat) { > if (osm->n_local_true > 0) { > ierr = MatDestroyMatrices(osm->n_local_true,&osm->pmat);CHKERRQ(ierr); > } > osm->pmat = 0; > } > if (pc->pmat) {ierr = MatDestroy(pc->pmat);CHKERRQ(ierr); pc->pmat = 0;} > return 0; > } > run make in that directory. > > Now call this routine in your program AFTER calling KSPSetUp() and KSPSetUpOnBlocks() or SNESSetUp() but before KSPSolve() or SNESSolve(). > > report any problems to petsc-maint at mcs.anl.gov > > >> When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed? > > Yes, all PETSc objects are reference counted and the KSP object keeps a reference to Pmat (actually the PC underneath keeps the reference.) > >> From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used? > > Under those conditions the space in P[i] is reused for the factor, thus saving the space of the "incomplete factorization of P" > > >> >> Thanks for your help, >> francois. >> >> > From gaurish108 at gmail.com Fri Feb 18 11:04:21 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Fri, 18 Feb 2011 12:04:21 -0500 Subject: [petsc-users] value of PETSC_ARCH Message-ID: Hi to install PETSc without the debugging version, what value of PETSC_ARCH should I give? Or is this automatically decided by PETSc during the configure sterp? I know that during configure I should pass --with-debugging=no -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean at mcs.anl.gov Fri Feb 18 11:09:44 2011 From: sean at mcs.anl.gov (Sean Farley) Date: Fri, 18 Feb 2011 11:09:44 -0600 Subject: [petsc-users] value of PETSC_ARCH In-Reply-To: References: Message-ID: > > Hi to install PETSc without the debugging version, what value of PETSC_ARCH > should I give? Or is this automatically decided by PETSc during the > configure sterp? Anything you like to distinguish the different arches. PETSC_ARCH is just a unique label so that you can install multiple versions of PETSc without duplicating the entire source tree. For example, $ ls $PETSC_DIR ... darwin10.5.0-cxx-debug darwin10.5.0-sieve-debug darwin10.5.0-cxx-intel darwin10.5.0-sieve-intel ... are four different installs (arches) I have with the name of each PETSC_ARCH reminding me what the difference is between them. Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From fpacull at fluorem.com Fri Feb 18 12:00:44 2011 From: fpacull at fluorem.com (francois pacull) Date: Fri, 18 Feb 2011 19:00:44 +0100 Subject: [petsc-users] ILU, ASM, GMRES and memory In-Reply-To: <633AC404-2940-4F82-8A18-2E5195D68F29@mcs.anl.gov> References: <4D5E8EA1.8090809@fluorem.com> <633AC404-2940-4F82-8A18-2E5195D68F29@mcs.anl.gov> Message-ID: <4D5EB3CC.2090106@fluorem.com> Thanks a lot Barry, I did include the new function PCASMFreeSpace to asm.c and compile it... I will measure the effect on memory next week. Regards, francois. Barry Smith wrote: > You will still need to delete your reference to the pmat by calling MatDestroy(pmat); after you have called KSPSetOperators(). > > Barry > > On Feb 18, 2011, at 9:42 AM, Barry Smith wrote: > > >> On Feb 18, 2011, at 9:22 AM, francois pacull wrote: >> >> >>> Dear PETSc team, >>> >>> I am using a gmres solver along with an additive Schwarz preconditioner and an ILU factorization within the sub-domains: ksp GMRES + pc ASM + subksp PREONLY + subpc ILU (MAT_SOLVER_PETSC). Also, I am using a preconditioner matrix Pmat that is different from the linear system operator matrix Amat. >>> >>> So, from my understanding, just after the end of the ILU factorization (for example, just after a call to PCSetUp(subpc) and before a call to KSPSolve(ksp,...)), the rank i process holds in the memory: >>> 1 - the local rows of Amat (ksp&pc's linear system matrix) >>> 2 - the local rows of Pmat (ksp&pc's precond matrix) >>> 3 - the sub-domain preconditioner operator, P[i], which is the local diagonal block of Pmat augmented with the overlap (subksp&subpc's matrix, linear system matrix = precond matrix) >>> 4 - the incomplete factorization of P[i] (subpc's ILU matrix) >>> >>> Is it correct? >>> If it is, how can I destroy parts 2 and 3, Pmat and the P[i]'s, in order to save some memory space for the Arnoldi vectors? >>> >> You will need to "hack" slightly to get the affect. Edit src/ksp/pc/impls/asm/asm.c and add a new function >> >> PCASMFreeSpace(PC pc) >> { >> PC_ASM *osm = (PC_ASM*)pc->data; >> PetscErrorCode ierr; >> >> if (osm->pmat) { >> if (osm->n_local_true > 0) { >> ierr = MatDestroyMatrices(osm->n_local_true,&osm->pmat);CHKERRQ(ierr); >> } >> osm->pmat = 0; >> } >> if (pc->pmat) {ierr = MatDestroy(pc->pmat);CHKERRQ(ierr); pc->pmat = 0;} >> return 0; >> } >> run make in that directory. >> >> Now call this routine in your program AFTER calling KSPSetUp() and KSPSetUpOnBlocks() or SNESSetUp() but before KSPSolve() or SNESSolve(). >> >> report any problems to petsc-maint at mcs.anl.gov >> >> >> >>> When Pmat is destroyed with MatDestroy, its corresponding memory space seems to be actually freed only when the ksp is destroyed? >>> >> Yes, all PETSc objects are reference counted and the KSP object keeps a reference to Pmat (actually the PC underneath keeps the reference.) >> >> >>> From what I remember, PCFactorSetUseInPlace will destroy the P[i]'s, but under the constraints that there is no fill-in and the natural matrix ordering is used? >>> >> Under those conditions the space in P[i] is reused for the factor, thus saving the space of the "incomplete factorization of P" >> >> >> >>> Thanks for your help, >>> francois. >>> >>> >>> > > > From gaurish108 at gmail.com Fri Feb 18 12:11:22 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Fri, 18 Feb 2011 13:11:22 -0500 Subject: [petsc-users] Strange behavior of -log_summary and final answer Message-ID: Hi I am using petsc-3.1-p7. My code seems to be behaving strangely during the execution step. I am solving a simple least squares problem with the LSQR routine. The options that I am using at the terminal are.: mpiexec -n 1 ./rect_input -f A2 -vector b2 -ksp_type lsqr -pc_type none -log_summary -ksp_max_it 1038 Even though I have used the -log_summary flag the performance summary does *not* get displayed on some occasions. The answer also is different from the answer I expect (which I obtain from matlab). But after running the executable twice or thrice, I get the performance summary along with the correct answer. Why is this happening? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Feb 18 12:30:47 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 18 Feb 2011 12:30:47 -0600 Subject: [petsc-users] Strange behavior of -log_summary and final answer In-Reply-To: References: Message-ID: Run with valgrind http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind Barry On Feb 18, 2011, at 12:11 PM, Gaurish Telang wrote: > Hi I am using petsc-3.1-p7. > > My code seems to be behaving strangely during the execution step. > > I am solving a simple least squares problem with the LSQR routine. The options that I am using at the terminal are.: > > mpiexec -n 1 ./rect_input -f A2 -vector b2 -ksp_type lsqr -pc_type none -log_summary -ksp_max_it 1038 > > Even though I have used the -log_summary flag the performance summary does *not* get displayed on some occasions. The answer also is different from the answer I expect (which I obtain from matlab). > > But after running the executable twice or thrice, I get the performance summary along with the correct answer. Why is this happening? > > > > > > > > > > > > > > > > > > > > From gdiso at ustc.edu Sat Feb 19 02:25:55 2011 From: gdiso at ustc.edu (Gong Ding) Date: Sat, 19 Feb 2011 16:25:55 +0800 Subject: [petsc-users] more flexible MatSetValues? Message-ID: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> Hi, After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct. Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation? And flush the buffer to real a,i,j array when MatAssemblyEnd is called? Gong Ding From jed at 59A2.org Sat Feb 19 03:21:09 2011 From: jed at 59A2.org (Jed Brown) Date: Sat, 19 Feb 2011 10:21:09 +0100 Subject: [petsc-users] more flexible MatSetValues? In-Reply-To: References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> Message-ID: Values in each location are often set many times. Once per element in FEM, so about 20 times for P1 tets. That uses a lot more memory and you need to sort that beast to count correctly. Using a separate dynamic data structure for each row would be a lot more mallocs, but you could keep the rows sorted and avoid storing 20 copies, however insertion is still expensive. A heap is nice for insertion, but not for searching. So dynamic data structures could help, but they still cost quite a bit. The preallocation problem is trivial for finite difference methods so any useful solution needs to handle many insertions to the same location. On Feb 19, 2011 9:35 AM, "Gong Ding" wrote: Hi, After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct. Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation? And flush the buffer to real a,i,j array when MatAssemblyEnd is called? Gong Ding -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Sat Feb 19 04:19:35 2011 From: juhaj at iki.fi (Juha =?iso-8859-1?q?J=E4ykk=E4?=) Date: Sat, 19 Feb 2011 10:19:35 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov> References: <201102161417.09649.juhaj@iki.fi> <201102172301.16497.juhaj@iki.fi> <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov> Message-ID: <201102191019.42133.juhaj@iki.fi> > I don't know how to handle the f'(1) = b. I was always taught to first > introduce new variables to reduce the problem to a first order equation. > For example let g = f' and the new problem is F(f,g,g') = 0 with the > additional equations g = f' now there are no second derivatives. Yes, that's always an option and for time stepping, for instance, that is probably always the best way to go, but I did not think that would be necessary for a simple second order ODE - albeit a non-linear one. Let me see what happens if I do that... Cheers, -Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From gdiso at ustc.edu Sat Feb 19 07:43:47 2011 From: gdiso at ustc.edu (Gong Ding) Date: Sat, 19 Feb 2011 21:43:47 +0800 (CST) Subject: [petsc-users] more flexible MatSetValues? In-Reply-To: References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> Message-ID: <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu> So dynamic array is suitable for acting as a plus to the preallocation. Only a few extra matrix entries not considered in the preallocation are needed to be processed. For example, an integral boundary condition with dynamic integration range may have different nonzero entry in a row, which can be hold by the dynamic array. Values in each location are often set many times. Once per element in FEM, so about 20 times for P1 tets. That uses a lot more memory and you need to sort that beast to count correctly. Using a separate dynamic data structure for each row would be a lot more mallocs, but you could keep the rows sorted and avoid storing 20 copies, however insertion is still expensive. A heap is nice for insertion, but not for searching. So dynamic data structures could help, but they still cost quite a bit. The preallocation problem is trivial for finite difference methods so any useful solution needs to handle many insertions to the same location. On Feb 19, 2011 9:35 AM, "Gong Ding" wrote: Hi, After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct. Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation? And flush the buffer to real a,i,j array when MatAssemblyEnd is called? Gong Ding -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Feb 19 08:27:51 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 19 Feb 2011 08:27:51 -0600 Subject: [petsc-users] more flexible MatSetValues? In-Reply-To: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> Message-ID: <097C0B3B-A792-42B4-83CD-C2EF1F642BA8@mcs.anl.gov> Because PETSc is designed as an object oriented library with class inheritance one could derive a new subclass with dynamic allocation from the current class without needing to write much new code. Barry On Feb 19, 2011, at 2:25 AM, Gong Ding wrote: > Hi, > After reading the source code of aij.c, I think the MatSetValues function can be more flexible when preallocation is not correct. > > Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer the operation? > And flush the buffer to real a,i,j array when MatAssemblyEnd is called? > > Gong Ding From jed at 59A2.org Sat Feb 19 08:45:02 2011 From: jed at 59A2.org (Jed Brown) Date: Sat, 19 Feb 2011 15:45:02 +0100 Subject: [petsc-users] more flexible MatSetValues? In-Reply-To: References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu> Message-ID: On Sat, Feb 19, 2011 at 14:43, Gong Ding wrote: an integral boundary condition with dynamic integration range Isn't there a maximum possible stencil for that boundary node? Why not preallocate for all of those and keep typical sparsity for the rest of the matrix? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdiso at ustc.edu Sat Feb 19 10:22:55 2011 From: gdiso at ustc.edu (Gong Ding) Date: Sun, 20 Feb 2011 00:22:55 +0800 (CST) Subject: [petsc-users] more flexible MatSetValues? In-Reply-To: References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu> Message-ID: <27526938.31101298132575330.JavaMail.coremail@mail.ustc.edu> Some nonlocal phenomenon such as band to band tunneling in semiconductor. an integral boundary condition with dynamic integration range Isn't there a maximum possible stencil for that boundary node? Why not preallocate for all of those and keep typical sparsity for the rest of the matrix? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Feb 19 10:24:09 2011 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 19 Feb 2011 10:24:09 -0600 Subject: [petsc-users] more flexible MatSetValues? In-Reply-To: <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu> References: <22BBE128ED964DC494E432F6322EE85E@cogendaeda> <4872253.30971298123027594.JavaMail.coremail@mail.ustc.edu> Message-ID: On Sat, Feb 19, 2011 at 7:43 AM, Gong Ding wrote: > So dynamic array is suitable for acting as a plus to the preallocation. > Only a few extra matrix entries not considered in the preallocation are > needed to be processed. > For example, an integral boundary condition with dynamic integration range > may have different nonzero entry in a row, > which can be hold by the dynamic array. > I did benchmark of the STL dynamic data structures, and the memory overhead is quite large. The problem here is not necessarily what could be done, but what community expectations are. People are not going to ditch their old Fortran code for something that allocations 3-4 times the memory. As Barry points out, you could easily make a new subclass. Matt > > > > Values in each location are often set many times. Once per element in FEM, > so about 20 times for P1 tets. That uses a lot more memory and you need to > sort that beast to count correctly. Using a separate dynamic data structure > for each row would be a lot more mallocs, but you could keep the rows sorted > and avoid storing 20 copies, however insertion is still expensive. A heap is > nice for insertion, but not for searching. > > So dynamic data structures could help, but they still cost quite a bit. The > preallocation problem is trivial for finite difference methods so any useful > solution needs to handle many insertions to the same location. > > On Feb 19, 2011 9:35 AM, "Gong Ding" wrote: > > Hi, > After reading the source code of aij.c, I think the MatSetValues function > can be more flexible when preallocation is not correct. > > Why not use a dynamic array such as c++ vector of triple(a, i, j) to buffer > the operation? > And flush the buffer to real a,i,j array when MatAssemblyEnd is called? > > Gong Ding > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From juhaj at iki.fi Mon Feb 21 06:10:19 2011 From: juhaj at iki.fi (Juha =?iso-8859-15?q?J=E4ykk=E4?=) Date: Mon, 21 Feb 2011 12:10:19 +0000 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102191019.42133.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov> <201102191019.42133.juhaj@iki.fi> Message-ID: <201102211210.23297.juhaj@iki.fi> > > introduce new variables to reduce the problem to a first order equation. > > For example let g = f' and the new problem is F(f,g,g') = 0 with the > > additional equations g = f' now there are no second derivatives. > Let me see what happens if I do that... Ok, so this helps. Now I can get the solution to converge on a small lattice, of less than 20 points. Increasing the lattice gives divergent zig-zag "solutions". Now this is usual central differences behaviour: it decouples even lattice points from odd ones and now that I have both f and f' as unknowns, this decoupling is total. (It was not previously, since f'', computed from f, does not decouple.) Changing to simple forward differences does not help, but changing to three- point forward differences (=five-point stencil, but the backwards points are not used) fixes the problem and I now get convergence. That is, thanks for all the help. I can now return to my actual equation, which still does not converge with these tricks on any lattice larger than about 50 points. I suppose the problem here is similar and I just need to find a better discretisation. Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From hung.thanh.nguyen at petrell.no Mon Feb 21 09:12:14 2011 From: hung.thanh.nguyen at petrell.no (Hung Thanh Nguyen) Date: Mon, 21 Feb 2011 16:12:14 +0100 Subject: [petsc-users] KSPBuildSolution In-Reply-To: <201102211210.23297.juhaj@iki.fi> References: <201102161417.09649.juhaj@iki.fi> <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov> <201102191019.42133.juhaj@iki.fi> <201102211210.23297.juhaj@iki.fi> Message-ID: Hi Pets use I just install Pets on Windows (I am using C compiler and ITL MKL). And, then running ex2.cpp .... to get error : Error 2 error: identifier "_intel_fast_memcpy" is undefined C:\cygwin\home\Hung\petsc-3.1- p7\include\petscsys.h 1775 Please help me. Best regard Hung T. Nguyen -----Original Message----- From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Juha J?ykk? Sent: 21. februar 2011 13:10 To: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSPBuildSolution > > introduce new variables to reduce the problem to a first order equation. > > For example let g = f' and the new problem is F(f,g,g') = 0 with > > the additional equations g = f' now there are no second derivatives. > Let me see what happens if I do that... Ok, so this helps. Now I can get the solution to converge on a small lattice, of less than 20 points. Increasing the lattice gives divergent zig-zag "solutions". Now this is usual central differences behaviour: it decouples even lattice points from odd ones and now that I have both f and f' as unknowns, this decoupling is total. (It was not previously, since f'', computed from f, does not decouple.) Changing to simple forward differences does not help, but changing to three- point forward differences (=five-point stencil, but the backwards points are not used) fixes the problem and I now get convergence. That is, thanks for all the help. I can now return to my actual equation, which still does not converge with these tricks on any lattice larger than about 50 points. I suppose the problem here is similar and I just need to find a better discretisation. Cheers, Juha -- ----------------------------------------------- | Juha J?ykk?, juhaj at iki.fi | | http://www.maths.leeds.ac.uk/~juhaj | ----------------------------------------------- From knepley at gmail.com Mon Feb 21 09:36:16 2011 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Feb 2011 09:36:16 -0600 Subject: [petsc-users] KSPBuildSolution In-Reply-To: References: <201102161417.09649.juhaj@iki.fi> <4177EA4C-604C-42C2-B50A-5225C15E5B95@mcs.anl.gov> <201102191019.42133.juhaj@iki.fi> <201102211210.23297.juhaj@iki.fi> Message-ID: Send all the output of 'make test' along with configure.log and make.log to petsc-maint at mcs.anl.gov Matt On Mon, Feb 21, 2011 at 9:12 AM, Hung Thanh Nguyen < hung.thanh.nguyen at petrell.no> wrote: > Hi Pets use > I just install Pets on Windows (I am using C compiler and ITL MKL). And, > then running ex2.cpp .... to get error : > > Error 2 error: identifier "_intel_fast_memcpy" is undefined > C:\cygwin\home\Hung\petsc-3.1- > p7\include\petscsys.h 1775 > > Please help me. Best regard Hung T. Nguyen > > -----Original Message----- > From: petsc-users-bounces at mcs.anl.gov [mailto: > petsc-users-bounces at mcs.anl.gov] On Behalf Of Juha J?ykk? > Sent: 21. februar 2011 13:10 > To: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSPBuildSolution > > > > introduce new variables to reduce the problem to a first order > equation. > > > For example let g = f' and the new problem is F(f,g,g') = 0 with > > > the additional equations g = f' now there are no second derivatives. > > Let me see what happens if I do that... > > Ok, so this helps. Now I can get the solution to converge on a small > lattice, of less than 20 points. > > Increasing the lattice gives divergent zig-zag "solutions". Now this is > usual central differences behaviour: it decouples even lattice points from > odd ones and now that I have both f and f' as unknowns, this decoupling is > total. (It was not previously, since f'', computed from f, does not > decouple.) > > Changing to simple forward differences does not help, but changing to > three- point forward differences (=five-point stencil, but the backwards > points are not used) fixes the problem and I now get convergence. > > That is, thanks for all the help. I can now return to my actual equation, > which still does not converge with these tricks on any lattice larger than > about 50 points. I suppose the problem here is similar and I just need to > find a better discretisation. > > Cheers, > Juha > > -- > ----------------------------------------------- > | Juha J?ykk?, juhaj at iki.fi | > | http://www.maths.leeds.ac.uk/~juhaj | > ----------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemens.domanig at uibk.ac.at Tue Feb 22 04:29:16 2011 From: clemens.domanig at uibk.ac.at (Clemens Domanig) Date: Tue, 22 Feb 2011 11:29:16 +0100 Subject: [petsc-users] matrix/vector-library Message-ID: <4D638FFC.6060607@uibk.ac.at> Hi out there, just a short question: What matrix/vector-library do you use for doing calculations with small matrices/vectors while using Petsc for the large problems? I have to do lots of calculations with small matrices before assembling the large system of equations for which I use Petsc. So I want to make sure that there is no namespace-trouble, etc. Thx for your help - Clemens Domanig From jed at 59A2.org Tue Feb 22 05:14:15 2011 From: jed at 59A2.org (Jed Brown) Date: Tue, 22 Feb 2011 12:14:15 +0100 Subject: [petsc-users] matrix/vector-library In-Reply-To: <4D638FFC.6060607@uibk.ac.at> References: <4D638FFC.6060607@uibk.ac.at> Message-ID: On Tue, Feb 22, 2011 at 11:29, Clemens Domanig wrote: > What matrix/vector-library do you use for doing calculations with small > matrices/vectors while using Petsc for the large problems? What language? How "small" is small? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 22 07:49:04 2011 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 22 Feb 2011 07:49:04 -0600 Subject: [petsc-users] matrix/vector-library In-Reply-To: <4D638FFC.6060607@uibk.ac.at> References: <4D638FFC.6060607@uibk.ac.at> Message-ID: BLAS/LAPACK. Matt On Tue, Feb 22, 2011 at 4:29 AM, Clemens Domanig wrote: > Hi out there, > > just a short question: What matrix/vector-library do you use for doing > calculations with small matrices/vectors while using Petsc for the large > problems? > I have to do lots of calculations with small matrices before assembling the > large system of equations for which I use Petsc. So I want to make sure that > there is no namespace-trouble, etc. > > Thx for your help - Clemens Domanig > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemens.domanig at uibk.ac.at Tue Feb 22 09:42:00 2011 From: clemens.domanig at uibk.ac.at (Clemens Domanig) Date: Tue, 22 Feb 2011 16:42:00 +0100 Subject: [petsc-users] matrix/vector-library In-Reply-To: <4D638FFC.6060607@uibk.ac.at> References: <4D638FFC.6060607@uibk.ac.at> Message-ID: <4D63D948.6080109@uibk.ac.at> Maybe someone knows a library with commands that are similar to MatLab because I have to port hundreds lines of MatLab-code to C++. Thx - C. Clemens Domanig wrote: > Hi out there, > > just a short question: What matrix/vector-library do you use for doing > calculations with small matrices/vectors while using Petsc for the large > problems? > I have to do lots of calculations with small matrices before assembling > the large system of equations for which I use Petsc. So I want to make > sure that there is no namespace-trouble, etc. > > Thx for your help - Clemens Domanig From u.tabak at tudelft.nl Tue Feb 22 09:43:47 2011 From: u.tabak at tudelft.nl (Umut Tabak) Date: Tue, 22 Feb 2011 16:43:47 +0100 Subject: [petsc-users] matrix/vector-library In-Reply-To: <4D63D948.6080109@uibk.ac.at> References: <4D638FFC.6060607@uibk.ac.at> <4D63D948.6080109@uibk.ac.at> Message-ID: <4D63D9B3.5070804@tudelft.nl> On 02/22/2011 04:42 PM, Clemens Domanig wrote: > knows a library with commands that are similar to MatLab because I Not exactly equal but similar, see uBlas from Boost (documentation is a bit of the downside though...) or another one which I did not check, namely, Eigen, U. From jed at 59A2.org Tue Feb 22 09:49:11 2011 From: jed at 59A2.org (Jed Brown) Date: Tue, 22 Feb 2011 16:49:11 +0100 Subject: [petsc-users] matrix/vector-library In-Reply-To: <4D63D948.6080109@uibk.ac.at> References: <4D638FFC.6060607@uibk.ac.at> <4D63D948.6080109@uibk.ac.at> Message-ID: On Tue, Feb 22, 2011 at 16:42, Clemens Domanig wrote: > Maybe someone knows a library with commands that are similar to MatLab > because I have to port hundreds lines of MatLab-code to C++. >From C++, you might consider a template library like Eigen ( http://eigen.tuxfamily.org). It overloads the usual arithmetic operators and performance is very good for small sizes. The downside is longer compilation time than a classic library and confusing error messages if you have type errors. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurish108 at gmail.com Tue Feb 22 11:06:22 2011 From: gaurish108 at gmail.com (Gaurish Telang) Date: Tue, 22 Feb 2011 12:06:22 -0500 Subject: [petsc-users] Pre-conditioners in PETSc Message-ID: I am quite confused on using pre-conditioners in PETSc (1) When we use KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat,MatStructure flag), why does the manual page say that Pmat is usually the same as Amat? Is Pmat the preconditioning matrix itself, or is Pmat a matrix to which preconditioning techniques must be applied via "-pc_type " ? (2) Also suppose the succeeding statement of KSPSetOperators is KSPSetFromOptions(ksp_context) and I pass "-pc_type none" at the terminal, would this mean that Pmat is not at all needed in PETSc's calculations?? Thank you, Gaurish -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecoon at lanl.gov Tue Feb 22 11:14:35 2011 From: ecoon at lanl.gov (Ethan Coon) Date: Tue, 22 Feb 2011 10:14:35 -0700 Subject: [petsc-users] Pre-conditioners in PETSc In-Reply-To: References: Message-ID: <1298394875.2045.3.camel@echo.lanl.gov> On Tue, 2011-02-22 at 12:06 -0500, Gaurish Telang wrote: > I am quite confused on using pre-conditioners in PETSc > > (1) > When we use KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat,MatStructure flag), why does the manual page say that Pmat is usually the same as Amat? > > Is Pmat the preconditioning matrix itself, or is Pmat a matrix to which preconditioning techniques must be applied via "-pc_type " ? The 2nd. Pmat is NOT the approximate inverse of Amat, it is a matrix whose approximate inverse will be used to multiple Amat before the KSP is used. > > (2) > Also suppose the succeeding statement of KSPSetOperators is KSPSetFromOptions(ksp_context) and I pass "-pc_type none" at the terminal, would this mean that Pmat is not at all needed > > in PETSc's calculations?? Correct, pmat is not used in that case. In that case, you still have to pass something in to KSPSetOperators, so Amat for both is the likely choice. Ethan > > Thank you, > > Gaurish > > > > > -- ------------------------------------ Ethan Coon Post-Doctoral Researcher Applied Mathematics - T-5 Los Alamos National Laboratory 505-665-8289 http://www.ldeo.columbia.edu/~ecoon/ ------------------------------------ From C.Klaij at marin.nl Wed Feb 23 09:45:39 2011 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Wed, 23 Feb 2011 15:45:39 +0000 Subject: [petsc-users] PCDiagonalScale Message-ID: I'm trying to understand the use of PCDiagonalScale since I want to apply additional diagonal scaling when solving my linear system. As a first step I modified src/ksp/ksp/examples/tutorials/ex2f.F in petsc-3.1-p7 as follows: 1) at line 87 added 3 lines: PC pc PCType ptype PetscScalar tol 2) then I uncommented lines 247 -- 252 (the ones to use PCJACOBI) 3) at line 253 I added : call PCDiagonalScale(pc,PETSC_TRUE,ierr) Running "make ex2f" gives: ex2f.o: In function `MAIN__': ex2f.F:(.text+0x767): undefined reference to `pcdiagonalscale_' Without the call to PCDiagonalScale "make ex2f" does not give any errors and runs fine... dr. ir. Christiaan Klaij CFD Researcher Research & Development E mailto:C.Klaij at marin.nl T +31 317 49 33 44 MARIN 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl From jed at 59A2.org Wed Feb 23 09:57:30 2011 From: jed at 59A2.org (Jed Brown) Date: Wed, 23 Feb 2011 16:57:30 +0100 Subject: [petsc-users] PCDiagonalScale In-Reply-To: References: Message-ID: On Wed, Feb 23, 2011 at 16:45, Klaij, Christiaan wrote: > 3) at line 253 I added : > call PCDiagonalScale(pc,PETSC_TRUE,ierr) > This is not the correct interface. You want PCDiagonalScaleSet(pc,X,ierr). PCDiagonalScale() is a getter with petsc-3.1 and is not available from Fortran (but you don't need it). The naming has been made consistent in petsc-dev. -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Thu Feb 24 01:16:07 2011 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 24 Feb 2011 07:16:07 +0000 Subject: [petsc-users] PCDiagonalScale Message-ID: OK. So if I understand correctly (?) in Fortan all I need is: call KSPGetPC call PCDiagonalSet call KSPSolve I'm using a MatShell and PCShell but I guess that doesn't matter? > 3) at line 253 I added : > call PCDiagonalScale(pc,PETSC_TRUE,ierr) > This is not the correct interface. You want PCDiagonalScaleSet(pc,X,ierr). PCDiagonalScale() is a getter with petsc-3.1 and is not available from Fortran (but you don't need it). The naming has been made consistent in petsc-dev. dr. ir. Christiaan Klaij CFD Researcher Research & Development E mailto:C.Klaij at marin.nl T +31 317 49 33 44 MARIN 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl From jed at 59A2.org Thu Feb 24 03:50:02 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 24 Feb 2011 10:50:02 +0100 Subject: [petsc-users] PCDiagonalScale In-Reply-To: References: Message-ID: On Thu, Feb 24, 2011 at 08:16, Klaij, Christiaan wrote: > OK. So if I understand correctly (?) in Fortan all I need is: > > call KSPGetPC > call PCDiagonalSet > As per my last message, it is spelled PCDiagonalScaleSet in petsc-3.1. > call KSPSolve > > I'm using a MatShell and PCShell but I guess that doesn't matter? > That doesn't matter, diagonal scaling occurs at a higher level than the individual implementation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From loic.gouarin at math.u-psud.fr Thu Feb 24 03:56:28 2011 From: loic.gouarin at math.u-psud.fr (gouarin) Date: Thu, 24 Feb 2011 10:56:28 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: <4D5E5F49.9090001@math.u-psud.fr> References: <4D5E3A68.6020805@math.u-psud.fr> <4D5E4DB4.5010706@math.u-psud.fr> <4D5E5F49.9090001@math.u-psud.fr> Message-ID: <4D662B4C.2090307@math.u-psud.fr> Hi, I take the GetMatrix code and I re write it for my Stokes problem with 2 DA grids: one for the velocity and an other for the pression. The preallocation is now correct but I have now a problem to use fieldsplit. I set block size to 3 for my matrix but I'm not sure that I can use it because I don't have the same number of points for each field. I don't hnow how petsc defines the blocks. How can I use again fieldsplit for the preconditioner ? Thanks. Loic On 18/02/2011 13:00, gouarin wrote: > On 18/02/2011 12:07, Dave May wrote: >> >> How much memory is used when you use >> -stokes_fieldsplit_0_ksp_max_it 1 >> -stokes_fieldsplit_0_pc_type jacobi >> ? >> It's possible that the copy of the diagonal blocks occurring when you >> invoke Fieldsplit just by itself is using all your available memory. I >> wouldn't be surprised with a stencil width of 2.... > This is the memory info given by the log_summary for nv=19 > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' > Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > Viewer 1 0 > 0 0 > Index Set 30 24 96544 0 > IS L to G Mapping 4 0 0 0 > Vec 46 17 > 338344 0 > Vec Scatter 12 0 > 0 0 > Matrix 22 0 > 0 0 > Distributed array 2 0 0 0 > Preconditioner 3 0 > 0 0 > Krylov Solver 3 0 > 0 0 > ======================================================================================================================== > > > and the malloc_info > > ------------------------------------------ > [0] Maximum memory PetscMalloc()ed 184348608 maximum size of entire > process 225873920 > [0] Memory usage sorted by function > [0] 2 3216 ClassPerfLogCreate() > [0] 2 1616 ClassRegLogCreate() > [0] 6 9152 DACreate() > [0] 17 114128 DACreate_3D() > [0] 3 48 DAGetCoordinateDA() > [0] 10 265632 DAGetMatrix3d_MPIAIJ() > [0] 3 48 DASetVertexDivision() > [0] 2 6416 EventPerfLogCreate() > [0] 1 12800 EventPerfLogEnsureSize() > [0] 2 1616 EventRegLogCreate() > [0] 1 3200 EventRegLogRegister() > [0] 12 329376 ISAllGather() > [0] 50 89344 ISCreateBlock() > [0] 25 354768 ISCreateGeneral() > [0] 60 7920 ISCreateStride() > [0] 12 161728 ISGetIndices_Stride() > [0] 2 21888 ISLocalToGlobalMappingBlock() > [0] 2 21888 ISLocalToGlobalMappingCreate() > [0] 12 1728 ISLocalToGlobalMappingCreateNC() > [0] 9 2544 KSPCreate() > [0] 1 16 KSPCreate_MINRES() > [0] 1 16 KSPCreate_Richardson() > [0] 3 48 KSPDefaultConvergedCreate() > [0] 66 41888 MatCreate() > [0] 6 960 MatCreate_MPIAIJ() > [0] 16 5632 MatCreate_SeqAIJ() > [0] 4 12000 MatGetRow_MPIAIJ() > [0] 4 64 MatGetSubMatrices_MPIAIJ() > [0] 160 941760 MatGetSubMatrices_MPIAIJ_Local() > [0] 4 121664 MatGetSubMatrix_MPIAIJ_Private() > [0] 16 304000 MatMarkDiagonal_SeqAIJ() > [0] 80 181061344 MatSeqAIJSetPreallocation_SeqAIJ() > [0] 12 113792 MatSetUpMultiply_MPIAIJ() > [0] 12 288 MatStashCreate_Private() > [0] 50 864 MatStashScatterBegin_Private() > [0] 120 108096 MatZeroRows_MPIAIJ() > [0] 10 182560 Mat_CheckInode() > [0] 9 1776 PCCreate() > [0] 1 144 PCCreate_FieldSplit() > [0] 2 64 PCCreate_Jacobi() > [0] 4 192 PCFieldSplitSetFields_FieldSplit() > [0] 1 16 PCSetFromOptions_FieldSplit() > [0] 5 22864 PCSetUp_FieldSplit() > [0] 4 64 PetscCommDuplicate() > [0] 1 4112 PetscDLLibraryOpen() > [0] 6 24576 PetscDLLibraryRetrieve() > [0] 45 1712 PetscDLLibrarySym() > [0] 579 27792 PetscFListAdd() > [0] 48 2112 PetscGatherMessageLengths() > [0] 52 832 PetscGatherNumberOfMessages() > [0] 90 4320 PetscLayoutCreate() > [0] 64 1392 PetscLayoutSetUp() > [0] 4 64 PetscLogPrintSummary() > [0] 12 384 PetscMaxSum() > [0] 24 6528 PetscOListAdd() > [0] 28 1792 PetscObjectSetState() > [0] 8 192 PetscOptionsGetEList() > [0] 16 4842288 PetscPostIrecvInt() > [0] 12 4842224 PetscPostIrecvScalar() > [0] 0 32 PetscPushSignalHandler() > [0] 1 432 PetscStackCreate() > [0] 1798 54816 PetscStrallocpy() > [0] 30 248832 PetscStrreplace() > [0] 2 45888 PetscTableAdd() > [0] 24 446528 PetscTableCreate() > [0] 3 96 PetscTokenCreate() > [0] 1 16 PetscViewerASCIIMonitorCreate() > [0] 1 16 PetscViewerASCIIOpen() > [0] 3 496 PetscViewerCreate() > [0] 1 64 PetscViewerCreate_ASCII() > [0] 2 528 StackCreate() > [0] 2 1008 StageLogCreate() > [0] 6 14400 User provided function() > [0] 138 58880 VecCreate() > [0] 66 1401952 VecCreate_MPI_Private() > [0] 7 221312 VecCreate_Seq() > [0] 9 288 VecCreate_Seq_Private() > [0] 6 160 VecDuplicateVecs_Default() > [0] 3 3008 VecGetArray3d() > [0] 42 49536 VecScatterCreate() > [0] 16 512 VecScatterCreateCommon_PtoS() > [0] 20 213024 VecScatterCreate_PtoP() > [0] 252 881536 VecScatterCreate_PtoS() > [0] 74 1184 VecStashCreate_Private() > > -- Loic Gouarin Laboratoire de Math?matiques Universit? Paris-Sud B?timent 425 91405 Orsay Cedex France Tel: (+33) 1 69 15 60 14 Fax: (+33) 1 69 15 67 18 From zonexo at gmail.com Thu Feb 24 04:01:57 2011 From: zonexo at gmail.com (TAY wee-beng) Date: Thu, 24 Feb 2011 11:01:57 +0100 Subject: [petsc-users] Problem using MatSetOption Message-ID: <4D662C95.9000700@gmail.com> Hi, I'm trying to use the MatSetOption in Fortran: call MatAssemblyBegin(A_mat_uv,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(A_mat_uv,MAT_FINAL_ASSEMBLY,ierr) call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) However I got the error: Error: This name does not have a type, and must have an explicit type. [MAT_NO_NEW_NONZERO_LOCATIONS] call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) I tried removing the PETSC_TRUE argument but it also can't work. Thanks for the help. -- Yours sincerely, TAY wee-beng From jed at 59A2.org Thu Feb 24 04:09:18 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 24 Feb 2011 11:09:18 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: <4D662B4C.2090307@math.u-psud.fr> References: <4D5E3A68.6020805@math.u-psud.fr> <4D5E4DB4.5010706@math.u-psud.fr> <4D5E5F49.9090001@math.u-psud.fr> <4D662B4C.2090307@math.u-psud.fr> Message-ID: On Thu, Feb 24, 2011 at 10:56, gouarin wrote: > I set block size to 3 for my matrix but I'm not sure that I can use it > because I don't have the same number of points for each field. I don't hnow > how petsc defines the blocks. > > How can I use again fieldsplit for the preconditioner ? > It is worth switching to petsc-dev for this: 1. Use DMComposite to "glue" the velocity and pressure DAs together. 2. Call PCSetDM (or the higher level KSPSetDM, SNESSetDM, or TSSetDM as appropriate) and pass in the DMComposite. Now when you -pc_type fieldsplit, it will automatically pick up the fields from the DMComposite (in the order they were registered). That would give you two splits in this case. You can use DMComposite with petsc-3.1, but you have to create index sets yourself and call PCFieldSplitSetIS(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Thu Feb 24 04:19:20 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 24 Feb 2011 11:19:20 +0100 Subject: [petsc-users] Problem using MatSetOption In-Reply-To: <4D662C95.9000700@gmail.com> References: <4D662C95.9000700@gmail.com> Message-ID: On Thu, Feb 24, 2011 at 11:01, TAY wee-beng wrote: > Error: This name does not have a type, and must have an explicit type. > [MAT_NO_NEW_NONZERO_LOCATIONS] > call MatSetOption(A_mat_uv,MAT_NO_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr) > The last reference to MAT_NO_NEW_NONZERO_LOCATIONS was removed more than two years ago. You should use MatSetOption(A_mat_uv,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr) -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Thu Feb 24 05:14:27 2011 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 24 Feb 2011 11:14:27 +0000 Subject: [petsc-users] PCDiagonalScale In-Reply-To: References: , Message-ID: Thanks Jed. Sorry for the typo, obviously I meant to write PCDiagonalScaleSet, not PCDiagonalSet. I'm using FGMRES but apparently it doesn't support diagonal scaling. Chris From: five9a2 at gmail.com [five9a2 at gmail.com] on behalf of Jed Brown [jed at 59A2.org] Sent: Thursday, February 24, 2011 10:50 AM On Thu, Feb 24, 2011 at 08:16, Klaij, Christiaan wrote: OK. So if I understand correctly (?) in Fortan all I need is: call KSPGetPC call PCDiagonalSet As per my last message, it is spelled PCDiagonalScaleSet in petsc-3.1. call KSPSolve I'm using a MatShell and PCShell but I guess that doesn't matter? That doesn't matter, diagonal scaling occurs at a higher level than the individual implementation. dr. ir. Christiaan Klaij CFD Researcher Research & Development E mailto:C.Klaij at marin.nl T +31 317 49 33 44 MARIN 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl From loic.gouarin at math.u-psud.fr Thu Feb 24 05:28:28 2011 From: loic.gouarin at math.u-psud.fr (gouarin) Date: Thu, 24 Feb 2011 12:28:28 +0100 Subject: [petsc-users] Stokes problem with DA and MUMPS In-Reply-To: References: <4D5E3A68.6020805@math.u-psud.fr> <4D5E4DB4.5010706@math.u-psud.fr> <4D5E5F49.9090001@math.u-psud.fr> <4D662B4C.2090307@math.u-psud.fr> Message-ID: <4D6640DC.1010303@math.u-psud.fr> I already use petsc-dev and DMComposite. I call PCSetDM and it works. Thanks, Loic On 24/02/2011 11:09, Jed Brown wrote: > On Thu, Feb 24, 2011 at 10:56, gouarin > wrote: > > I set block size to 3 for my matrix but I'm not sure that I can > use it because I don't have the same number of points for each > field. I don't hnow how petsc defines the blocks. > > How can I use again fieldsplit for the preconditioner ? > > > It is worth switching to petsc-dev for this: > > 1. Use DMComposite to "glue" the velocity and pressure DAs together. > > 2. Call PCSetDM (or the higher level KSPSetDM, SNESSetDM, or TSSetDM > as appropriate) and pass in the DMComposite. > > Now when you -pc_type fieldsplit, it will automatically pick up the > fields from the DMComposite (in the order they were registered). That > would give you two splits in this case. > > You can use DMComposite with petsc-3.1, but you have to create index > sets yourself and call PCFieldSplitSetIS(). -- Loic Gouarin Laboratoire de Math?matiques Universit? Paris-Sud B?timent 425 91405 Orsay Cedex France Tel: (+33) 1 69 15 60 14 Fax: (+33) 1 69 15 67 18 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Thu Feb 24 05:30:15 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 24 Feb 2011 12:30:15 +0100 Subject: [petsc-users] PCDiagonalScale In-Reply-To: References: Message-ID: On Thu, Feb 24, 2011 at 12:14, Klaij, Christiaan wrote: > > I'm using FGMRES but apparently it doesn't support diagonal scaling. > GCR also tolerates a variable preconditioner and it does not have such a check. I don't know if that means it can use diagonal scaling or just that someone forgot to check, but you could try it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Thu Feb 24 07:34:16 2011 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 24 Feb 2011 13:34:16 +0000 Subject: [petsc-users] PCDiagonalScale In-Reply-To: References: , Message-ID: I tried GCR but at first sight the diagonal scaling doesn't seem to do anything. From: five9a2 at gmail.com [five9a2 at gmail.com] on behalf of Jed Brown [jed at 59A2.org] Sent: Thursday, February 24, 2011 12:30 PM On Thu, Feb 24, 2011 at 12:14, Klaij, Christiaan wrote: I'm using FGMRES but apparently it doesn't support diagonal scaling. GCR also tolerates a variable preconditioner and it does not have such a check. I don't know if that means it can use diagonal scaling or just that someone forgot to check, but you could try it. dr. ir. Christiaan Klaij CFD Researcher Research & Development E mailto:C.Klaij at marin.nl T +31 317 49 33 44 MARIN 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl From bsmith at mcs.anl.gov Thu Feb 24 08:08:42 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 24 Feb 2011 08:08:42 -0600 Subject: [petsc-users] PCDiagonalScale In-Reply-To: References: Message-ID: <699188CA-750D-446A-AD58-C62CB2A75CE2@mcs.anl.gov> On Feb 23, 2011, at 9:45 AM, Klaij, Christiaan wrote: > I'm trying to understand the use of PCDiagonalScale since I want to apply additional diagonal scaling when solving my linear system. Do you want to use it exactly for the reason given in PCSetDiagonalScale - Indicates the left scaling to use to apply an additional left and right scaling as needed by certain time-stepping codes. Logically Collective on PC Input Parameters: + pc - the preconditioner context - s - scaling vector Level: intermediate Notes: The system solved via the Krylov method is $ D M A D^{-1} y = D M b for left preconditioning or $ D A M D^{-1} z = D b for right preconditioning > > As a first step I modified src/ksp/ksp/examples/tutorials/ex2f.F in petsc-3.1-p7 as follows: > > 1) at line 87 added 3 lines: > PC pc > PCType ptype > PetscScalar tol > > 2) then I uncommented lines 247 -- 252 (the ones to use PCJACOBI) > > 3) at line 253 I added : > call PCDiagonalScale(pc,PETSC_TRUE,ierr) > > Running "make ex2f" gives: > > ex2f.o: In function `MAIN__': > ex2f.F:(.text+0x767): undefined reference to `pcdiagonalscale_' > > Without the call to PCDiagonalScale "make ex2f" does not give any errors and runs fine... > > > dr. ir. Christiaan Klaij > CFD Researcher > Research & Development > E mailto:C.Klaij at marin.nl > T +31 317 49 33 44 > > MARIN > 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands > T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl > From brtnfld at uiuc.edu Thu Feb 24 09:33:59 2011 From: brtnfld at uiuc.edu (M. Scot Breitenfeld) Date: Thu, 24 Feb 2011 09:33:59 -0600 Subject: [petsc-users] MatSetValues is expensive Message-ID: <4D667A67.50700@uiuc.edu> Hi, I'm working on a particle type method and I'm using MatSetValues, to insert values (add_values) into my matrix. Currently I: do i, loop over number of particles do j, loop over particles in i's family ... in row i's dof; insert values in columns of j's (x,y,z) dofs (3 calls to MatSetValues for i's x,y,z dof) in row j's dof; insert values in columns of i's (x,y,z) dofs (3 calls to MatSetValues for j's x,y,z dof) ... enddo enddo Running serially, using MatSetValues it takes 294.8 sec. to assemble the matrix, if I remove the calls to MatSetValues it takes 29.5 sec. to run through the same loops, so the MatSetValues calls take up 90% of the assembling time. I'm preallocating the A matrix specifying d_nnz and o_nnz. I guess I need to add extra storage so I can call the MatSetValues with more values so that I can call it less, or just do a lot of recalculating of values so that I can add an entire row at once. I just want to make sure this is expected behavior and not something that I'm doing wrong before I start to rewrite my assembling routine. Probably a hash table would be better but I don't want to store that and then convert that to a CRS matrix, I'm already running into memory issues as it is. Just out of curiosity, wouldn't a finite element code have a similar situation, in that case you would form the local stiffness matrix and then insert that into the global stiffness matrix, so you would be calling MatSetValues "number of elements" times. From knepley at gmail.com Thu Feb 24 09:55:34 2011 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Feb 2011 09:55:34 -0600 Subject: [petsc-users] MatSetValues is expensive In-Reply-To: <4D667A67.50700@uiuc.edu> References: <4D667A67.50700@uiuc.edu> Message-ID: On Thu, Feb 24, 2011 at 9:33 AM, M. Scot Breitenfeld wrote: > Hi, > > I'm working on a particle type method and I'm using MatSetValues, to > insert values (add_values) into my matrix. Currently I: > > do i, loop over number of particles > do j, loop over particles in i's family > ... > in row i's dof; insert values in columns of j's (x,y,z) dofs > (3 calls to MatSetValues for i's x,y,z dof) > You can set this whole row with a single call. > in row j's dof; insert values in columns of i's (x,y,z) dofs > (3 calls to MatSetValues for j's x,y,z dof) > ... > enddo > enddo > > Running serially, using MatSetValues it takes 294.8 sec. to assemble the > matrix, if I remove the calls to MatSetValues it takes 29.5 sec. to run > through the same loops, so the MatSetValues calls take up 90% of the > assembling time. I'm preallocating the A matrix specifying d_nnz and o_nnz. > Its hard to believe that the preallocation is correct. In order to check, use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR) before your MatSetValues() calls. > I guess I need to add extra storage so I can call the MatSetValues with > more values so that I can call it less, or just do a lot of > recalculating of values so that I can add an entire row at once. I just > Recalculating? > want to make sure this is expected behavior and not something that I'm > doing wrong before I start to rewrite my assembling routine. Probably a > hash table would be better but I don't want to store that and then > convert that to a CRS matrix, I'm already running into memory issues as > it is. > > Just out of curiosity, wouldn't a finite element code have a similar > situation, in that case you would form the local stiffness matrix and > then insert that into the global stiffness matrix, so you would be > calling MatSetValues "number of elements" times. > No, you call it once per element matrix. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Thu Feb 24 10:20:32 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 24 Feb 2011 17:20:32 +0100 Subject: [petsc-users] MatSetValues is expensive In-Reply-To: <4D667A67.50700@uiuc.edu> References: <4D667A67.50700@uiuc.edu> Message-ID: On Thu, Feb 24, 2011 at 16:33, M. Scot Breitenfeld wrote: > I'm working on a particle type method and I'm using MatSetValues, to > insert values (add_values) into my matrix. Currently I: > > do i, loop over number of particles > do j, loop over particles in i's family > How big is a typical family? > ... > in row i's dof; insert values in columns of j's (x,y,z) dofs > (3 calls to MatSetValues for i's x,y,z dof) > Why make three calls here instead of one? > in row j's dof; insert values in columns of i's (x,y,z) dofs > (3 calls to MatSetValues for j's x,y,z dof) > Again, why call these separately? Also, is the matrix symmetric? > ... > enddo > enddo > > Running serially, using MatSetValues it takes 294.8 sec. to assemble the > matrix, > Are you sure that it was preallocated correctly? Is the cost to compute the entries essentially zero? > if I remove the calls to MatSetValues it takes 29.5 sec. to run > through the same loops, so the MatSetValues calls take up 90% of the > assembling time. I'm preallocating the A matrix specifying d_nnz and o_nnz. > > I guess I need to add extra storage so I can call the MatSetValues with > more values so that I can call it less, or just do a lot of > recalculating of values so that I can add an entire row at once. I just > want to make sure this is expected behavior and not something that I'm > doing wrong before I start to rewrite my assembling routine. Probably a > hash table would be better but I don't want to store that and then > convert that to a CRS matrix, I'm already running into memory issues as > it is. > > Just out of curiosity, wouldn't a finite element code have a similar > situation, in that case you would form the local stiffness matrix and > then insert that into the global stiffness matrix, so you would be > calling MatSetValues "number of elements" times. > FEM has a simple quadrature loop that builds a dense element matrix, MatSetValues() is called once per element. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brtnfld at uiuc.edu Thu Feb 24 16:49:49 2011 From: brtnfld at uiuc.edu (M. Scot Breitenfeld) Date: Thu, 24 Feb 2011 16:49:49 -0600 Subject: [petsc-users] MatSetValues is expensive In-Reply-To: References: <4D667A67.50700@uiuc.edu> Message-ID: <4D66E08D.4090609@uiuc.edu> On 02/24/2011 10:20 AM, Jed Brown wrote: > On Thu, Feb 24, 2011 at 16:33, M. Scot Breitenfeld > wrote: > > I'm working on a particle type method and I'm using MatSetValues, to > insert values (add_values) into my matrix. Currently I: > > do i, loop over number of particles > do j, loop over particles in i's family > > > How big is a typical family? 300- 900 particles > > > ... > in row i's dof; insert values in columns of j's (x,y,z) dofs > (3 calls to MatSetValues for i's x,y,z dof) > > > Why make three calls here instead of one? I split the x-y-z row entries up depending on if the dof is prescribed, I'll combine them and see if that helps. > > > in row j's dof; insert values in columns of i's (x,y,z) dofs > (3 calls to MatSetValues for j's x,y,z dof) > > > Again, why call these separately? Also, is the matrix symmetric? I split them depending on if the dof of particle j is prescribed. The matrix is symmetric and the percentage of non-zeros in the matrix has a range of 3%-7% depending on the number of particles in the family. > > > ... > enddo > enddo > > Running serially, using MatSetValues it takes 294.8 sec. to > assemble the > matrix, > > > Are you sure that it was preallocated correctly? Is the cost to > compute the entries essentially zero? Assuming you preallocate by: CALL MatCreateMPIAIJ(PETSC_COMM_WORLD, & 3*mctr, 3*mctr, & total_global_nodes*3, total_global_nodes*3, & 0, d_nnz, 0, o_nnz, A, ierr) and I tried, as suggested, CALL MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE, ierr) and it does not report any problems. I would not say it's zero to compute the entries (I guess it takes about 3.5ms per particle for the calculations). This is a fairly small case, only 8000 particles. > > > > > FEM has a simple quadrature loop that builds a dense element matrix, > MatSetValues() is called once per element. That is what I meant, once per element. I do have a simpler formulation that allows me to enter an entire row all at once per particle: do i, loop over number of particles do j, loop over particles in i's family ... fill the rows of i's dofs (x,y,z) enddo call MatSetValues... enddo For the same case, it takes 6 seconds to assemble. If I remove the MatSetValues call it takes 0.66 seconds (for this case the calculations are REALLY simple), the family is also always smaller then the previous method. From jed at 59A2.org Thu Feb 24 16:59:19 2011 From: jed at 59A2.org (Jed Brown) Date: Thu, 24 Feb 2011 23:59:19 +0100 Subject: [petsc-users] MatSetValues is expensive In-Reply-To: <4D66E08D.4090609@uiuc.edu> References: <4D667A67.50700@uiuc.edu> <4D66E08D.4090609@uiuc.edu> Message-ID: On Thu, Feb 24, 2011 at 23:49, M. Scot Breitenfeld wrote: > I would not say it's zero to compute the entries (I guess it takes about > 3.5ms per particle for the calculations). This is a fairly small case, > only 8000 particles. > With 300 to 900 interactions per particle, times 3 for each component, times two for lower and upper triangular piece. So we're looking at half a microsecond per insertion. That still seems like a lot, but perhaps the access pattern is very irregular because the particles have an essentially random ordering. Did you build --with-debugging=0? That should make a reasonable difference. Also, since the matrix is symmetric, you might consider using the SBAIJ matrix format. That will cut your storage costs almost in half and should speed up insertion because all interactions for a given particle will be in the same block-row, thus nearby in memory. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrabbani at gmail.com Thu Feb 24 20:43:21 2011 From: mgrabbani at gmail.com (Golam Rabbani) Date: Thu, 24 Feb 2011 18:43:21 -0800 Subject: [petsc-users] About MatLUFactor() Message-ID: Hi, I need to use MatLUFactor(), but do not know what to pass for the last 3 arguments. Would you please explain a bit; I did not find any example code for this one. PetscErrorCode MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo *info) My matrix is a dense one and the result from this call will be used in MatMatSolve(). Regards, Golam -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 24 21:26:43 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 24 Feb 2011 21:26:43 -0600 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: <4DB44D2D-CA7D-4D9B-94B9-98511778D496@mcs.anl.gov> For MATSEQDENSE the factorization is just a thin wrapper to LAPACK, the row, col and info are ignored. You can pass in 0 for all three arguments Barry On Feb 24, 2011, at 8:43 PM, Golam Rabbani wrote: > Hi, > > I need to use MatLUFactor(), but do not know what to pass for the last 3 arguments. Would you please explain a bit; I did not find any example code for this one. > > PetscErrorCode MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo *info) > My matrix is a dense one and the result from this call will be used in MatMatSolve(). > > Regards, > Golam From hzhang at mcs.anl.gov Thu Feb 24 21:31:29 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Thu, 24 Feb 2011 21:31:29 -0600 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: For sequential dense in-place LU factorization, see /src/mat/examples/tests/ex1.c. For parallel, you need install PLAPACK. It seems we do not have MatMatSolve() for dense matrix format. I can add it if you need it. Hong On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani wrote: > Hi, > > I need to use MatLUFactor(), but do not know what to pass for the last 3 > arguments. Would you please explain a bit; I did not find any example code > for this one. > > PetscErrorCode MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo *info) > > My matrix is a dense one and the result from this call will be used in > MatMatSolve(). > > Regards, > Golam > From mgrabbani at gmail.com Fri Feb 25 01:56:30 2011 From: mgrabbani at gmail.com (Golam Rabbani) Date: Thu, 24 Feb 2011 23:56:30 -0800 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: Oh, It thought MatMatSolve() was there as you mention something about it in the FAQ. Please add it then. Thanks, Golam On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang wrote: > For sequential dense in-place LU factorization, see > /src/mat/examples/tests/ex1.c. > For parallel, you need install PLAPACK. > > It seems we do not have MatMatSolve() for dense matrix format. I can > add it if you need it. > > Hong > > On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani > wrote: > > Hi, > > > > I need to use MatLUFactor(), but do not know what to pass for the last 3 > > arguments. Would you please explain a bit; I did not find any example > code > > for this one. > > > > PetscErrorCode MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo > *info) > > > > My matrix is a dense one and the result from this call will be used in > > MatMatSolve(). > > > > Regards, > > Golam > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mirzadeh at gmail.com Fri Feb 25 03:45:43 2011 From: mirzadeh at gmail.com (Mohammad Mirzadeh) Date: Fri, 25 Feb 2011 01:45:43 -0800 Subject: [petsc-users] undefined reference error in make test Message-ID: Hi all, I just noticed that when compiling petsc-3.1-p7 with hypre-2.0.0, running make test results in the following undefined reference error on ex19 and ex5f: --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html /home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpicc -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -g3 -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include -I/home/m.mirzadeh/soft/petsc-3.1-p7/include -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include -D__INSDIR__=src/snes/examples/tutorials/ ex19.c /home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -g3 -o ex19 ex19.o -Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lpetsc -lX11 -Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -lrt -L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl /home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib/libpetsc.a(mhyp.o): In function `MatZeroEntries_HYPREStruct_3d': /home/m.mirzadeh/soft/petsc-3.1-p7/src/dm/da/utils/mhyp.c:397: undefined reference to `hypre_StructMatrixClearBoxValues' collect2: ld returned 1 exit status make[3]: [ex19] Error 1 (ignored) /bin/rm -f ex19.o --------------Error detected during compile or link!----------------------- See http://www.mcs.anl.gov/petsc/petsc-2/documentation/troubleshooting.html /home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpif90 -c -Wall -Wno-unused-variable -g -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include -I/home/m.mirzadeh/soft/petsc-3.1-p7/include -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include -I/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/include -o ex5f.o ex5f.F /home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g -o ex5f ex5f.o -Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lpetsc -lX11 -Wl,-rpath,/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -lrt -L/home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl /home/m.mirzadeh/soft/petsc-3.1-p7/linux-gnu-c-debug/lib/libpetsc.a(mhyp.o): In function `MatZeroEntries_HYPREStruct_3d': /home/m.mirzadeh/soft/petsc-3.1-p7/src/dm/da/utils/mhyp.c:397: undefined reference to `hypre_StructMatrixClearBoxValues' collect2: ld returned 1 exit status make[3]: [ex5f] Error 1 (ignored) /bin/rm -f ex5f.o Completed test examples It seems that the function "hypre_StructMatrixClearBoxValues()" is not properly defined. This problem is new as I didn't have any trouble with petsc-3.0.0-p12. Mohammad -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Fri Feb 25 03:58:23 2011 From: jed at 59A2.org (Jed Brown) Date: Fri, 25 Feb 2011 10:58:23 +0100 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: On Fri, Feb 25, 2011 at 10:45, Mohammad Mirzadeh wrote: > I just noticed that when compiling petsc-3.1-p7 with hypre-2.0.0 It seems that the Hypre team has no plans to do "general releases" so everyone uses "beta" releases instead. I see that hypre-2.7.0b has been released and petsc-3.1 might work with it, but that has not been tested. Note that --download-hypre will build a current version (hypre-2.6.0b) for you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mirzadeh at gmail.com Fri Feb 25 05:18:03 2011 From: mirzadeh at gmail.com (Mohammad Mirzadeh) Date: Fri, 25 Feb 2011 03:18:03 -0800 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: Just tried hypre-2.7.0b. That didn't solve the problem either. However, I just found this in my make.log file that might help: mhyp.c: In function 'MatZeroEntries_HYPREStruct_3d': mhyp.c:397: warning: implicit declaration of function 'hypre_StructMatrixClearBoxValues' On Fri, Feb 25, 2011 at 1:58 AM, Jed Brown wrote: > On Fri, Feb 25, 2011 at 10:45, Mohammad Mirzadeh wrote: > >> I just noticed that when compiling petsc-3.1-p7 with hypre-2.0.0 > > > It seems that the Hypre team has no plans to do "general releases" so > everyone uses "beta" releases instead. I see that hypre-2.7.0b has been > released and petsc-3.1 might work with it, but that has not been tested. > Note that --download-hypre will build a current version (hypre-2.6.0b) for > you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59a2.org Fri Feb 25 06:25:56 2011 From: jed at 59a2.org (Jed Brown) Date: Fri, 25 Feb 2011 13:25:56 +0100 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: On Fri, Feb 25, 2011 at 12:18, Mohammad Mirzadeh wrote: > Just tried hypre-2.7.0b. That didn't solve the problem either. However, I > just found this in my make.log file that might help: > > mhyp.c: In function 'MatZeroEntries_HYPREStruct_3d': > mhyp.c:397: warning: implicit declaration of function > 'hypre_StructMatrixClearBoxValues' > I just built PETSc with hypre-2.7.0b and it works for me. Are you sure that you are using the new hypre-2.7.0b instead of stale files from hypre-2.0.0? You can check that this function is declared in "_hypre_struct_mv.h", and also check that it is present in the library using $ nm ompi/lib/libHYPRE.a |grep hypre_StructMatrixClearBoxValues U hypre_StructMatrixClearBoxValues 0000000000004543 T hypre_StructMatrixClearBoxValues The second line shows that the symbol is defined in that archive. Background: they intended this to be a private function, but did not provide a public interface to zero the entries, therefore PETSc calls this private function which is declared in #include "_hypre_struct_mv.h" included via "mhyp.h" from "mhyp.c". -------------- next part -------------- An HTML attachment was scrubbed... URL: From hung.thanh.nguyen at petrell.no Fri Feb 25 06:24:32 2011 From: hung.thanh.nguyen at petrell.no (Hung Thanh Nguyen) Date: Fri, 25 Feb 2011 13:24:32 +0100 Subject: [petsc-users] problem to build PETCS with Window-Intel_mkl_blas_lapack_mpi Message-ID: Hi all I am new PETSC using. I am try to install PETSc on Window-Intel-MKL. I am not sure how to link PETSc with intel-mkl-blac-lapack-mpi ? I try : $./config/configure.py -with-vendor-compilers=intel \ --with-blac-lapack-dir=/opt/intel/mkl/11.1/067/ia32/lib \ --with-mpi-dir=/opt/intel/mkl//11.1/067/ia32/lib And I got the error-message : UNABLE to CONFIGURE with GIVEN OPTIONS You must specify a path for MPI with -mpi-dir= If you do not want MPI, then given -with-mpi=0 .... Best Regards Hung T. Nguyen -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Feb 25 08:24:07 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 25 Feb 2011 08:24:07 -0600 Subject: [petsc-users] problem to build PETCS with Window-Intel_mkl_blas_lapack_mpi In-Reply-To: References: Message-ID: <742F6FBB-2635-41DA-A16A-10EBF05F6AA8@mcs.anl.gov> > --with-blac-lapack-dir should be --with-blas-lapack-dir If it fails after rerunning then please send the resulting file configure.log to petsc-maint at mcs.anl.gov Barry On Feb 25, 2011, at 6:24 AM, Hung Thanh Nguyen wrote: > Hi all > I am new PETSC using. I am try to install PETSc on Window-Intel-MKL. I am not sure how to link PETSc with intel-mkl-blac-lapack-mpi ? I try : > > $./config/configure.py ?with-vendor-compilers=intel \ --with-blac-lapack-dir=/opt/intel/mkl/11.1/067/ia32/lib \ --with-mpi-dir=/opt/intel/mkl//11.1/067/ia32/lib > > And I got the error-message : UNABLE to CONFIGURE with GIVEN OPTIONS > > You must specify a path for MPI with ?mpi-dir= > If you do not want MPI, then given ?with-mpi=0 > ?. > > Best Regards > Hung T. Nguyen > From brtnfld at uiuc.edu Fri Feb 25 10:31:35 2011 From: brtnfld at uiuc.edu (M. Scot Breitenfeld) Date: Fri, 25 Feb 2011 10:31:35 -0600 Subject: [petsc-users] MatSetValues is expensive In-Reply-To: References: <4D667A67.50700@uiuc.edu> <4D66E08D.4090609@uiuc.edu> Message-ID: <4D67D967.8020109@uiuc.edu> On 02/24/2011 04:59 PM, Jed Brown wrote: > On Thu, Feb 24, 2011 at 23:49, M. Scot Breitenfeld > wrote: > > I would not say it's zero to compute the entries (I guess it takes > about > 3.5ms per particle for the calculations). This is a fairly small case, > only 8000 particles. > > > With 300 to 900 interactions per particle, times 3 for each component, > times two for lower and upper triangular piece. So we're looking at > half a microsecond per insertion. That still seems like a lot, but > perhaps the access pattern is very irregular because the particles > have an essentially random ordering. Did you build --with-debugging=0? > That should make a reasonable difference. > I have not because on the machine I'm running on the compilation fails with error: cast to type "__m64" is not allowed, I've reported it back in September (I'm going to upgrade my compiler and OS soon, so hopefully that will fix the problem). I'll recompile it on another machine and see if that helps. > Also, since the matrix is symmetric, you might consider using the > SBAIJ matrix format. That will cut your storage costs almost in half > and should speed up insertion because all interactions for a given > particle will be in the same block-row, thus nearby in memory. That would be great! But I don't see in the manual a function for creating a parallel SBAIJ matrix, only a sequential SBAIJ. From jed at 59A2.org Fri Feb 25 10:36:40 2011 From: jed at 59A2.org (Jed Brown) Date: Fri, 25 Feb 2011 17:36:40 +0100 Subject: [petsc-users] MatSetValues is expensive In-Reply-To: <4D67D967.8020109@uiuc.edu> References: <4D667A67.50700@uiuc.edu> <4D66E08D.4090609@uiuc.edu> <4D67D967.8020109@uiuc.edu> Message-ID: On Fri, Feb 25, 2011 at 17:31, M. Scot Breitenfeld wrote: > I have not because on the machine I'm running on the compilation fails > with error: cast to type "__m64" is not allowed, I've reported it back > in September (I'm going to upgrade my compiler and OS soon, so hopefully > that will fix the problem). > Sounds like a compiler/libraries problem. That would be great! But I don't see in the manual a function for > creating a parallel SBAIJ matrix, only a sequential SBAIJ. > See MatCreateMPISBAIJ() -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Feb 25 11:25:33 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Fri, 25 Feb 2011 11:25:33 -0600 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: Golam: > Oh, It thought MatMatSolve() was there as you mention something about it in > the FAQ. Please add it then. Checking carefully, I realize that petsc supports MatMatSolve() for all matrix types by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector, except few matrix types (seqdense is not included) that we provided more efficient implementation. A simplified ex1.c with MatMatSolve() for petsc-dev is attached (note: this only works with petsc-dev) for your info. Hong > > Thanks, > Golam > > On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang wrote: >> >> For sequential dense in-place LU factorization, see >> /src/mat/examples/tests/ex1.c. >> For parallel, you need install PLAPACK. >> >> It seems we do not have MatMatSolve() for dense matrix format. I can >> add it if you need it. >> >> Hong >> >> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani >> wrote: >> > Hi, >> > >> > I need to use MatLUFactor(), but do not know what to pass for the last 3 >> > arguments. Would you please explain a bit; I did not find any example >> > code >> > for this one. >> > >> > PetscErrorCode ?MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo >> > *info) >> > >> > My matrix is a dense one and the result from this call will be used in >> > MatMatSolve(). >> > >> > Regards, >> > Golam >> > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1.c Type: text/x-csrc Size: 6160 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Feb 25 11:36:14 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 25 Feb 2011 11:36:14 -0600 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: A custom MatMatSolve_SeqDense() will work much better than the one that does a single solve at a time because it will use Blas 3 unstead of 2. Barry On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote: > Golam: >> Oh, It thought MatMatSolve() was there as you mention something about it in >> the FAQ. Please add it then. > > Checking carefully, I realize that petsc supports MatMatSolve() for > all matrix types > by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector, > except few matrix types (seqdense is not included) that we provided > more efficient implementation. > > A simplified ex1.c with MatMatSolve() for petsc-dev is attached > (note: this only works with petsc-dev) > for your info. > > Hong > > > >> >> Thanks, >> Golam >> >> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang wrote: >>> >>> For sequential dense in-place LU factorization, see >>> /src/mat/examples/tests/ex1.c. >>> For parallel, you need install PLAPACK. >>> >>> It seems we do not have MatMatSolve() for dense matrix format. I can >>> add it if you need it. >>> >>> Hong >>> >>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani >>> wrote: >>>> Hi, >>>> >>>> I need to use MatLUFactor(), but do not know what to pass for the last 3 >>>> arguments. Would you please explain a bit; I did not find any example >>>> code >>>> for this one. >>>> >>>> PetscErrorCode MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo >>>> *info) >>>> >>>> My matrix is a dense one and the result from this call will be used in >>>> MatMatSolve(). >>>> >>>> Regards, >>>> Golam >>>> >> >> > From hzhang at mcs.anl.gov Fri Feb 25 11:49:02 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Fri, 25 Feb 2011 11:49:02 -0600 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: I'll add it. Hong On Fri, Feb 25, 2011 at 11:36 AM, Barry Smith wrote: > > ? A custom MatMatSolve_SeqDense() will work much better than the one that does a single solve at a time because it will use Blas 3 unstead of 2. > > ? Barry > > On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote: > >> Golam: >>> Oh, It thought MatMatSolve() was there as you mention something about it in >>> the FAQ. Please add it then. >> >> Checking carefully, I realize that petsc supports MatMatSolve() for >> all matrix types >> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector, >> except ?few matrix types (seqdense is not included) that we provided >> more efficient implementation. >> >> A simplified ex1.c ?with MatMatSolve() for petsc-dev is attached >> (note: this only works with petsc-dev) >> for your info. >> >> Hong >> >> >> >>> >>> Thanks, >>> Golam >>> >>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang wrote: >>>> >>>> For sequential dense in-place LU factorization, see >>>> /src/mat/examples/tests/ex1.c. >>>> For parallel, you need install PLAPACK. >>>> >>>> It seems we do not have MatMatSolve() for dense matrix format. I can >>>> add it if you need it. >>>> >>>> Hong >>>> >>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani >>>> wrote: >>>>> Hi, >>>>> >>>>> I need to use MatLUFactor(), but do not know what to pass for the last 3 >>>>> arguments. Would you please explain a bit; I did not find any example >>>>> code >>>>> for this one. >>>>> >>>>> PetscErrorCode ?MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo >>>>> *info) >>>>> >>>>> My matrix is a dense one and the result from this call will be used in >>>>> MatMatSolve(). >>>>> >>>>> Regards, >>>>> Golam >>>>> >>> >>> >> > > From mirzadeh at gmail.com Fri Feb 25 17:03:18 2011 From: mirzadeh at gmail.com (Mohammad Mirzadeh) Date: Fri, 25 Feb 2011 15:03:18 -0800 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: Thanks Jed. I think the problem was petsc using some of files from hypre 2.0 although I was compiling it with 2.7. I did a fresh install with hypre 2.7 and now everything is running fine. Thanks again, Mohammad On Fri, Feb 25, 2011 at 4:25 AM, Jed Brown wrote: > On Fri, Feb 25, 2011 at 12:18, Mohammad Mirzadeh wrote: > >> Just tried hypre-2.7.0b. That didn't solve the problem either. However, I >> just found this in my make.log file that might help: >> >> mhyp.c: In function 'MatZeroEntries_HYPREStruct_3d': >> mhyp.c:397: warning: implicit declaration of function >> 'hypre_StructMatrixClearBoxValues' >> > > I just built PETSc with hypre-2.7.0b and it works for me. Are you sure that > you are using the new hypre-2.7.0b instead of stale files from hypre-2.0.0? > You can check that this function is declared in "_hypre_struct_mv.h", and > also check that it is present in the library using > > $ nm ompi/lib/libHYPRE.a |grep hypre_StructMatrixClearBoxValues > U hypre_StructMatrixClearBoxValues > 0000000000004543 T hypre_StructMatrixClearBoxValues > > The second line shows that the symbol is defined in that archive. > > Background: they intended this to be a private function, but did not > provide a public interface to zero the entries, therefore PETSc calls this > private function which is declared in > > #include "_hypre_struct_mv.h" > > included via "mhyp.h" from "mhyp.c". > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Feb 25 18:43:10 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Fri, 25 Feb 2011 18:43:10 -0600 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: Customized MatMatSolve_SeqDense() is added to petsc-dev. Hong On Fri, Feb 25, 2011 at 11:36 AM, Barry Smith wrote: > > ? A custom MatMatSolve_SeqDense() will work much better than the one that does a single solve at a time because it will use Blas 3 unstead of 2. > > ? Barry > > On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote: > >> Golam: >>> Oh, It thought MatMatSolve() was there as you mention something about it in >>> the FAQ. Please add it then. >> >> Checking carefully, I realize that petsc supports MatMatSolve() for >> all matrix types >> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs vector, >> except ?few matrix types (seqdense is not included) that we provided >> more efficient implementation. >> >> A simplified ex1.c ?with MatMatSolve() for petsc-dev is attached >> (note: this only works with petsc-dev) >> for your info. >> >> Hong >> >> >> >>> >>> Thanks, >>> Golam >>> >>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang wrote: >>>> >>>> For sequential dense in-place LU factorization, see >>>> /src/mat/examples/tests/ex1.c. >>>> For parallel, you need install PLAPACK. >>>> >>>> It seems we do not have MatMatSolve() for dense matrix format. I can >>>> add it if you need it. >>>> >>>> Hong >>>> >>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani >>>> wrote: >>>>> Hi, >>>>> >>>>> I need to use MatLUFactor(), but do not know what to pass for the last 3 >>>>> arguments. Would you please explain a bit; I did not find any example >>>>> code >>>>> for this one. >>>>> >>>>> PetscErrorCode ?MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo >>>>> *info) >>>>> >>>>> My matrix is a dense one and the result from this call will be used in >>>>> MatMatSolve(). >>>>> >>>>> Regards, >>>>> Golam >>>>> >>> >>> >> > > From mgrabbani at gmail.com Fri Feb 25 19:19:25 2011 From: mgrabbani at gmail.com (Golam Rabbani) Date: Fri, 25 Feb 2011 17:19:25 -0800 Subject: [petsc-users] About MatLUFactor() In-Reply-To: References: Message-ID: Thanks. Although I have never used petsc-dev but the standard version seemed easy to me( I am new to linux as well)... will let you know once i have tried it out. Golam On Fri, Feb 25, 2011 at 4:43 PM, Hong Zhang wrote: > Customized MatMatSolve_SeqDense() is added to petsc-dev. > > Hong > > On Fri, Feb 25, 2011 at 11:36 AM, Barry Smith wrote: > > > > A custom MatMatSolve_SeqDense() will work much better than the one that > does a single solve at a time because it will use Blas 3 unstead of 2. > > > > Barry > > > > On Feb 25, 2011, at 11:25 AM, Hong Zhang wrote: > > > >> Golam: > >>> Oh, It thought MatMatSolve() was there as you mention something about > it in > >>> the FAQ. Please add it then. > >> > >> Checking carefully, I realize that petsc supports MatMatSolve() for > >> all matrix types > >> by calling MatMatSolve_Basic() which calls MatSolve() for each rhs > vector, > >> except few matrix types (seqdense is not included) that we provided > >> more efficient implementation. > >> > >> A simplified ex1.c with MatMatSolve() for petsc-dev is attached > >> (note: this only works with petsc-dev) > >> for your info. > >> > >> Hong > >> > >> > >> > >>> > >>> Thanks, > >>> Golam > >>> > >>> On Thu, Feb 24, 2011 at 7:31 PM, Hong Zhang > wrote: > >>>> > >>>> For sequential dense in-place LU factorization, see > >>>> /src/mat/examples/tests/ex1.c. > >>>> For parallel, you need install PLAPACK. > >>>> > >>>> It seems we do not have MatMatSolve() for dense matrix format. I can > >>>> add it if you need it. > >>>> > >>>> Hong > >>>> > >>>> On Thu, Feb 24, 2011 at 8:43 PM, Golam Rabbani > >>>> wrote: > >>>>> Hi, > >>>>> > >>>>> I need to use MatLUFactor(), but do not know what to pass for the > last 3 > >>>>> arguments. Would you please explain a bit; I did not find any example > >>>>> code > >>>>> for this one. > >>>>> > >>>>> PetscErrorCode MatLUFactor(Mat mat,IS row,IS col,const MatFactorInfo > >>>>> *info) > >>>>> > >>>>> My matrix is a dense one and the result from this call will be used > in > >>>>> MatMatSolve(). > >>>>> > >>>>> Regards, > >>>>> Golam > >>>>> > >>> > >>> > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From frfoust at yahoo.com Sat Feb 26 18:56:55 2011 From: frfoust at yahoo.com (F R Foust) Date: Sat, 26 Feb 2011 16:56:55 -0800 (PST) Subject: [petsc-users] Search order for BLAS Message-ID: <739965.61334.qm@web130223.mail.mud.yahoo.com> I was trying to build petsc using MKL blas/lapack and had a small issue. I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead. I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL). Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did). Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib? I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against. Thanks much! From balay at mcs.anl.gov Sat Feb 26 19:02:59 2011 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 26 Feb 2011 19:02:59 -0600 (CST) Subject: [petsc-users] Search order for BLAS In-Reply-To: <739965.61334.qm@web130223.mail.mud.yahoo.com> References: <739965.61334.qm@web130223.mail.mud.yahoo.com> Message-ID: I've changed the search order - so it should look for atlas last [as it can usually be found in /usr/lib]. For non-system default packages the search order shouldn't matter. This change should be available in the next patch update to petsc-3.1 http://petsc.cs.iit.edu/petsc/releases/BuildSystem-3.1/rev/838c7bfa03e0 Satish On Sat, 26 Feb 2011, F R Foust wrote: > I was trying to build petsc using MKL blas/lapack and had a small issue. I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead. I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL). > > Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did). Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib? I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against. > > Thanks much! > > > > From jed at 59A2.org Sat Feb 26 19:03:37 2011 From: jed at 59A2.org (Jed Brown) Date: Sun, 27 Feb 2011 02:03:37 +0100 Subject: [petsc-users] Search order for BLAS In-Reply-To: <739965.61334.qm@web130223.mail.mud.yahoo.com> References: <739965.61334.qm@web130223.mail.mud.yahoo.com> Message-ID: On Sun, Feb 27, 2011 at 01:56, F R Foust wrote: > I was trying to build petsc using MKL blas/lapack and had a small issue. I > tried to point configure at the mkl directory with --with-blas-lapack-dir, > but it found an installation of ATLAS already in /usr/local/lib and used > that instead. > Is that where your MKL is? If not, then this should be considered a bug, but it's probably hard to fix because /usr/local/lib must be a system path which is searched automatically. > I poked around and found a fixed search order in > BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then > MKL). > > Is there any way to force a specific flavor of BLAS in a flag passed to > configure (I mean, so I don't have to modify BlasLapack.py, which is what I > did). Or alternatively, is there a way to force the issue by using > --with-blas-lapack-lib, --with-blas-lib? I wasn't able to figure out the > correct incantation to include all of the stuff MKL needs to link against. > You can find what is necessary here, then put it in --with-blas-lapack-lib. http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From frfoust at yahoo.com Sat Feb 26 19:08:30 2011 From: frfoust at yahoo.com (F R Foust) Date: Sat, 26 Feb 2011 17:08:30 -0800 (PST) Subject: [petsc-users] Search order for BLAS In-Reply-To: Message-ID: <915412.68663.qm@web130203.mail.mud.yahoo.com> No, my copy of MKL is somewhere else, not in /usr/local/lib.? I understand that /usr/local/lib should probably be searched automatically.? Thanks for the link advisor reference -- that's enormously helpful. Looks like Satish just went ahead and changed the search order, so I guess it's moot now.? Thanks much, guys. FR Foust --- On Sat, 2/26/11, Jed Brown wrote: On Sun, Feb 27, 2011 at 01:56, F R Foust wrote: I was trying to build petsc using MKL blas/lapack and had a small issue. ?I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead. Is that where your MKL is? If not, then this should be considered a bug, but it's probably hard to fix because /usr/local/lib must be a system path which is searched automatically. ??I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL). Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did). ?Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib? ?I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against. You can find what is necessary here, then put it in --with-blas-lapack-lib. http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Feb 26 19:36:50 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 26 Feb 2011 19:36:50 -0600 Subject: [petsc-users] Search order for BLAS In-Reply-To: References: <739965.61334.qm@web130223.mail.mud.yahoo.com> Message-ID: <6031DF9F-322D-4269-8849-EFEDD724A988@mcs.anl.gov> On Feb 26, 2011, at 7:03 PM, Jed Brown wrote: > On Sun, Feb 27, 2011 at 01:56, F R Foust wrote: > I was trying to build petsc using MKL blas/lapack and had a small issue. I tried to point configure at the mkl directory with --with-blas-lapack-dir, but it found an installation of ATLAS already in /usr/local/lib and used that instead. > > Is that where your MKL is? If not, then this should be considered a bug, but it's probably hard to fix because /usr/local/lib must be a system path which is searched automatically. > > I poked around and found a fixed search order in BuildSystem/config/packages/BlasLapack.py (it tries ATLAS, AMD ACML, then MKL). > > Is there any way to force a specific flavor of BLAS in a flag passed to configure (I mean, so I don't have to modify BlasLapack.py, which is what I did). Or alternatively, is there a way to force the issue by using --with-blas-lapack-lib, --with-blas-lib? I wasn't able to figure out the correct incantation to include all of the stuff MKL needs to link against. > > You can find what is necessary here, then put it in --with-blas-lapack-lib. > > http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ Very powerful, but why does it only support a tiny number of versions? Barry From jed at 59A2.org Sat Feb 26 19:41:12 2011 From: jed at 59A2.org (Jed Brown) Date: Sun, 27 Feb 2011 02:41:12 +0100 Subject: [petsc-users] Search order for BLAS In-Reply-To: <6031DF9F-322D-4269-8849-EFEDD724A988@mcs.anl.gov> References: <739965.61334.qm@web130223.mail.mud.yahoo.com> <6031DF9F-322D-4269-8849-EFEDD724A988@mcs.anl.gov> Message-ID: Why make your library so confusing you need an online calculator to find out how to link it? On Feb 27, 2011 2:37 AM, "Barry Smith" wrote: On Feb 26, 2011, at 7:03 PM, Jed Brown wrote: > On Sun, Feb 27, 2011 at 01:56, F R Foust From gianmail at gmail.com Mon Feb 28 04:46:45 2011 From: gianmail at gmail.com (Gianluca Meneghello) Date: Mon, 28 Feb 2011 11:46:45 +0100 Subject: [petsc-users] Tridiagonal and pentadiagonal matrices Message-ID: Hi, I was looking for the best way to solve tridiagonal and pentadiagonal matrix in Petsc. Is there a specific matrix format/solver for these kind of systems I should use? The tridiagonal/pentadiagonal matrix I have to solve corresponds to the main 3/5 diagonals of a bigger matrix (if it can help, I'm trying to solve a system using block-line Gauss Seidel). I've seen there is an easy way to obtain the main diagonal of the matrix (MatGetDiagonal). Is there an equivalent way to extract the other data? Thanks Gianluca -- "[Je pense que] l'homme est un monde qui vaut des fois les mondes et que les plus ardentes ambitions sont celles qui ont eu l'orgueil de l'Anonymat" -- Non omnibus, sed mihi et tibi Amedeo Modigliani From hzhang at mcs.anl.gov Mon Feb 28 11:26:25 2011 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 28 Feb 2011 11:26:25 -0600 Subject: [petsc-users] Tridiagonal and pentadiagonal matrices In-Reply-To: References: Message-ID: Gianluca : > > I was looking for the best way to solve tridiagonal and pentadiagonal > matrix in Petsc. Is there a specific matrix format/solver for these > kind of systems I should use? No. For sequential matrix, you may use LAPACK routines for band or tridiagonal matrices. > > The tridiagonal/pentadiagonal matrix I have to solve corresponds to > the main 3/5 diagonals of a bigger matrix (if it can help, I'm trying > to solve a system using block-line Gauss Seidel). I've seen there is > an easy way to obtain the main diagonal of the matrix > (MatGetDiagonal). Is there an equivalent way to extract the other > data? You may use MatGetSubMatrix(). For efficient assemble of your submatrix, you may look into the private date structure (AIJ?) and obtain your submatrix. For aij format, check aij.h or mpiaij.h for its datastructure. Hong > > Thanks > > Gianluca > > -- > "[Je pense que] l'homme est un monde qui vaut des fois les mondes et > que les plus ardentes ambitions sont celles qui ont eu l'orgueil de > l'Anonymat" -- Non omnibus, sed mihi et tibi > Amedeo Modigliani > From jdbst21 at gmail.com Mon Feb 28 22:07:33 2011 From: jdbst21 at gmail.com (Joshua Booth) Date: Mon, 28 Feb 2011 23:07:33 -0500 Subject: [petsc-users] Multiple Copies of KSP Message-ID: Hello, I was wondering if it is possible to have multiple KSP running, each on their own core but in the same program. When I have tried this before using MPI_COMM_SELF, I get an error. Thank you Joshua Booth -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Feb 28 22:11:05 2011 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Feb 2011 22:11:05 -0600 (CST) Subject: [petsc-users] Multiple Copies of KSP In-Reply-To: References: Message-ID: yes you can create as many KSP solves as you need. check src/ksp/ksp/examples/tutorials/ex9.c satish On Mon, 28 Feb 2011, Joshua Booth wrote: > Hello, > > I was wondering if it is possible to have multiple KSP running, each on > their own core but in the same program. > When I have tried this before using MPI_COMM_SELF, I get an error. > > Thank you > > Joshua Booth > From bsmith at mcs.anl.gov Mon Feb 28 22:11:38 2011 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 28 Feb 2011 22:11:38 -0600 Subject: [petsc-users] Multiple Copies of KSP In-Reply-To: References: Message-ID: <99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov> On Feb 28, 2011, at 10:07 PM, Joshua Booth wrote: > Hello, > > I was wondering if it is possible to have multiple KSP running, each on their own core but in the same program. Yes. If you want a separate linear solve on each MPI process then you use MPI_COMM_SELF for the KSP, note that you also need to use that same MPI_COMM_SELF for creating the Vecs and Mats since each process needs its own for its particular linear system. > When I have tried this before using MPI_COMM_SELF, I get an error. We would need more information to determine what has gone wrong. Barry > > Thank you > > Joshua Booth From jdbst21 at gmail.com Mon Feb 28 22:16:05 2011 From: jdbst21 at gmail.com (Joshua Booth) Date: Mon, 28 Feb 2011 23:16:05 -0500 Subject: [petsc-users] Multiple Copies of KSP In-Reply-To: References: Message-ID: I meant that each core was running its own ksp at the same time. Not two different solvers over the same world. On Mon, Feb 28, 2011 at 11:11 PM, Satish Balay wrote: > yes you can create as many KSP solves as you need. > > check src/ksp/ksp/examples/tutorials/ex9.c > > satish > > On Mon, 28 Feb 2011, Joshua Booth wrote: > > > Hello, > > > > I was wondering if it is possible to have multiple KSP running, each on > > their own core but in the same program. > > When I have tried this before using MPI_COMM_SELF, I get an error. > > > > Thank you > > > > Joshua Booth > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdbst21 at gmail.com Mon Feb 28 22:19:15 2011 From: jdbst21 at gmail.com (Joshua Booth) Date: Mon, 28 Feb 2011 23:19:15 -0500 Subject: [petsc-users] Multiple Copies of KSP In-Reply-To: <99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov> References: <99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov> Message-ID: Hello, In reply, I get a Segment fault even though I only call: KSP; KSPCreate(MPI_COMM_SELF, &ksp); Again... note that this is being done by all cores in the MPI_COMM_WORLD at the same time. On Mon, Feb 28, 2011 at 11:11 PM, Barry Smith wrote: > > On Feb 28, 2011, at 10:07 PM, Joshua Booth wrote: > > > Hello, > > > > I was wondering if it is possible to have multiple KSP running, each on > their own core but in the same program. > > Yes. If you want a separate linear solve on each MPI process then you use > MPI_COMM_SELF for the KSP, note that you also need to use that same > MPI_COMM_SELF for creating the Vecs and Mats since each process needs its > own for its particular linear system. > > > When I have tried this before using MPI_COMM_SELF, I get an error. > > We would need more information to determine what has gone wrong. > > Barry > > > > > > Thank you > > > > Joshua Booth > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Feb 28 23:12:32 2011 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Feb 2011 23:12:32 -0600 (CST) Subject: [petsc-users] Multiple Copies of KSP In-Reply-To: References: <99D27679-2490-4749-A61B-B66A22D9BC47@mcs.anl.gov> Message-ID: It should work. you'll have to use gdb to determine exact location and reason for this crash in your code. satish On Mon, 28 Feb 2011, Joshua Booth wrote: > Hello, > > In reply, > > I get a Segment fault even though I only call: > KSP; > KSPCreate(MPI_COMM_SELF, &ksp); > > Again... note that this is being done by all cores in the MPI_COMM_WORLD at > the same time. > > On Mon, Feb 28, 2011 at 11:11 PM, Barry Smith wrote: > > > > > On Feb 28, 2011, at 10:07 PM, Joshua Booth wrote: > > > > > Hello, > > > > > > I was wondering if it is possible to have multiple KSP running, each on > > their own core but in the same program. > > > > Yes. If you want a separate linear solve on each MPI process then you use > > MPI_COMM_SELF for the KSP, note that you also need to use that same > > MPI_COMM_SELF for creating the Vecs and Mats since each process needs its > > own for its particular linear system. > > > > > When I have tried this before using MPI_COMM_SELF, I get an error. > > > > We would need more information to determine what has gone wrong. > > > > Barry > > > > > > > > > > Thank you > > > > > > Joshua Booth > > > > >